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Preface 



In less than a decade of existence, the Web has reached a truly staggering stage, 
demonstrated by the scope, the reach, and the size of Web-based applications 
and activities. Concentrating initially on information dissemination, the scope of 
the applications is now limited only by our imagination. The reach is constantly 
expanding and so are the number and size of the applications, along with the 
underlying complexity, range of purposes, and the time needed to develop and 
maintain them. At the same time, the development and maintenance processes 
of Web applications have not progressed at a sufficiently rapid pace to meet these 
challenges and demands. Consequently, the likelihood is that Web application 
development will get into a crisis and it is not hard to imagine that this would 
dwarf the ‘software crisis’ identified long ago in the 1960s. 

Web Engineering aims to avert this potential crisis by generating a proactive 
approach to the successful development of Web-based systems and applications. 
Web Engineering involves the use of scientific, engineering, and management 
principles and systematic approaches with the aim of successfully developing, 
deploying, and maintaining high quality Web-based systems and applications. 

Web Engineering, in its current form, is an early attempt to identify the sig- 
nificant issues and problems, and their solutions, in developing Web-based appli- 
cations. As we see it, Web Engineering is not yet established as a full discipline 
nor has it developed an identifiable or stable form, since everything connected 
with the Web is still in a state of flux. One only has to look at the number of 
varied activities that the World Wide Web Consortium is engaged in to realise 
that a stable Web environment, and hence proven methods for developmental 
activities based on the Web, is still some distance away. 

Our early forays into the Web arena, with the constant excitement of new 
developments and challenges, forcefully brought to mind our entry into the com- 
puting field, almost three decades ago. At that time, compared to what the 
technology could do, our efforts in computerizing payroll and accounting appli- 
cations in reality seemed puny and disappointing. The Web, on the other hand, 
did not seem shackled, in a way that early computing was, to these bureau- 
cratic and unimaginative ways of conducting human and organizational affairs. 
It seemed that the organizational, spatial, and physical constraints were about 
to loosen, if not disappear, altogether. 

And yet, when we looked around at the way Web sites and applications were 
being developed, it seemed to us that the early pattern of haphazard devel- 
opment, minimal testing, and lack of attention to the maintenance issues that 
characterised the ‘software crisis’ were still very much with us. It was as though 
the ‘new generation’ insisted on making the same mistakes as its parents! 

This feeling of deja vu led us to question the nature of Web-based and Web- 
related activities. Of course, we were not alone, as we soon discovered. The 
result is what is being called Web Engineering which had its first introduction 
in a workshop at the Seventh World Wide Web (WWW7) conference in Brisbane 
in 1998. It has now become a series with more workshops at WWW8 (Toronto, 
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Preface 



1999) and WWW9 (Amsterdam, 2000), and also at the International Conference 
on Software Engineering (ICSE99) in 1999 in Los Angeles and ICSE2000 in 
Limerick, Ireland. Another workshop is scheduled for WWWIO in Hong Kong 
in May 2001. 

The main purpose behind these workshops has been to share and pool the 
collective experience of people, both academics and practitioners, who are ac- 
tively working on Web-based systems. The workshops have generally consisted 
of keynote addresses, peer-reviewed contributed papers, and sessions of open 
discussions. 



About This Book 

In this volume, we provide a consolidated view of recent work, highlighting devel- 
opments and advances in the area of Web Engineering. This selection of papers 
draws mainly from the last three workshops, held in conjunction with ICSE1999, 
WWW9, and ICSE2000. We also present a list of additional, useful resources on 
Web Engineering such as books, special issues, articles, and Web sites. Our aim 
is to provide a book that will be a convenient and useful reference to all the 
researchers, practitioners, and students interested in Web application develop- 
ment. 

Web Engineering takes its inspiration from Software Engineering. At the 
same time, it is also an explicit acknowledgement of the multi-dimensional nature 
of Web applications, encompassing technical computing, information structur- 
ing, navigation and management, network performance and security, legal and 
social issues, graphic design, multiplicity of user profiles, and the varied opera- 
tional environments. Accordingly, the papers in this volume cover perspectives 
on Web Engineering, navigation and adaptivity, design aspects, acceptance cri- 
teria for Web-based systems, development and management of Web sites and 
Web-based applications, Web metrics, and case studies. 

For convenience, the papers are organized in five sections: 1) Introduction 
and Perspectives, 2) Managing Information on the Web, 3) Web-Based Systems 
Development, 4) Design for Performance, Web Metrics, and Testing, and 5) 
Web Maintenance and Reuse. In their own ways, all the papers are forward- 
looking, trying to anticipate problems, creating tools, experimenting in novel 
ways, widening the areas of applications, and re-examining paradigms. In other 
words, the papers represent a shared attitude of being inclusive rather than 
focusing narrowly. 

Web Engineering is a forward looking and collaborative discipline. The pa- 
pers in this compendium, taken individually, represent only the tip of the iceberg 
of worldwide Web development. Together, they make a significant contribution 
to the evolution of a more systematic approach to Web development. The com- 
pendium has been made possible by the many people who share these views. We 
hope the readers will join us in these endeavors. 



January 2001 



San Murugesan 
Yogesh Deshpande 
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Web Engineering: Introduction and Perspectives 



1 Overview 

This section addresses some of the most fundamental themes of Weh Engineering. In 
the context of identification and promotion of Weh Engineering as a new discipline 
for the development of Web-based systems and applications, several questions 
naturally arise: What is Web Engineering? What is its place among all the other 
disciplines? Why is it being put forward as a new discipline? When is it needed? Is 
there an illustrative case study that would highlight the arguments? 

The three papers in this section answer these questions. The first two papers 
originate from the early exposure of the authors to Web developmental activities, 
which helped to define the field of Web Engineering and to bring a focus on areas that 
are not regarded as part of the traditional domains of computer science, information 
systems and software engineering. The third paper reports on the development of Web 
sites and Web-based applications that were undertaken consciously after most of the 
arguments for Web Engineering had been articulated earlier by the first two papers. 

Thus, this section not only provides a foundation for Web Engineering, but also 
highlights the experiences and benefits gained by practicing Web Engineering 
principles and methodologies in development of Web-based applications. This 
section, thereby, reflects an early cycle of evolutionary nature of Web application 
development - theory, practice, and feedback and refinement. 

The first paper, Web Engineering: A New Discipline for Web-Based System 
Development, defines Web Engineering and argues that Web Engineering is different 
from software engineering, identifying nine specific items/areas of differences 
between the two disciplines. Web Engineering is concerned with establishment and 
use of sound scientific, engineering and management principles and disciplined and 
systematic approaches to the successful development, deployment and maintenance of 
high quality Web-based systems and applications. In essence, Web Engineering 
covers multidisciplinary activities. It is not a limiting and rigid concept and it is 
compatible with other metaphors such as ‘gardening’ used by other authors for Web- 
based system development. 

The second paper, Web Engineering: Beyond CS, IS and SE - Evolutionary and 
Non-Engineering Perspectives, deals with Web Engineering from an evolutionary and 
non-engineering point of view, arguing that development of Web-based systems and 
hence Web Engineering require much more than expertise in computer science, 
information systems and software engineering as they are currently understood. The 
paper highlights three major influences shaping Web Engineering: 1) mass 
participation and the rise in end-user applications and decision support systems; 2) 
evolution of systems development methodologies and their uptake or lack of it; and 3) 
the development of novel distributed systems which reach and cater to people beyond 
any specific organisation, raising legal, social, ethical and privacy concerns. The 
paper observes that unlike other traditional engineering disciplines, Web Engineering 
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as yet does not have ‘an established base of knowledge’ and hence has to be more 
forward looking, drawing upon the lessons of all the other relevant disciplines. 

The third paper, Web Engineering in Action, is a case study of Web development 
within a large organisation, supporting and elaborating on Web Engineering. The 
paper argues that development of Web sites and applications is a process and not a 
one-off event. Such development will have a start but not a predictable end. 
Consequently, we need to identify various activities within this process that will have 
defined start and finish points, in order to systematise and manage the overall 
development. The Web environment is also characterised by changes in purposes over 
time, thus necessitating repetitions of the activities. Without a well-defined systematic 
approach, it is impossible for a group of people to effectively work together on a large 
Web development project. 




Web Engineering: A New Discipline for Development of 
Web -Based Systems 



San Murugesan, Yogesh Deshpande, Steve Hansen and Athula Ginige 

Dept of Computing and Information Systems, University of Western Sydney, 
Macarthur, Campbelltown NSW 2560, Australia 
{ s . murugesan , y . deshpande , s . hansen , a . ginige } @uws . edu . au 



Abstract. In most cases, development of Web-based systems has been 
ad hoc, lacking systematic approach, and quality control and assurance 
procedures. Hence, there is now legitimate and growing concern about 
the manner in which Web-based systems are developed and their 
quality and integrity. Web Engineering, an emerging new discipline, 
advocates a process and a systematic approach to development of high 
quality Web-based systems. It promotes the establishment and use of 
sound scientific, engineering and management principles, and 
disciplined and systematic approaches to development, deployment and 
maintenance of Web-based systems. This paper gives an introductory 
overview on Web Engineering. It presents the principles and roles of 
Web Engineering, assesses the similarities and differences between 
development of traditional software and Web-based systems, and 
identifies key Web engineering activities. It also highlights the 
prospects of Web engineering and the areas that need further study. 

Keywords: Web engineering, Web-based systems development, Web 
crisis, Web design, Web development, Web lifecycle 



1 Introduction 

The growth of the Internet, Intranets, Extranets, and the World Wide Web has had 
significant impact on business, commerce, industry, banking and finance, education, 
government and entertainment sectors, and our personal and working lives. Many 
legacy information and database systems are being migrated to Internet and Web 
environments. Electronic commerce through the Internet is rapidly growing, cutting 
across national boundaries. A wide range of new, complex distributed applications is 
emerging in the Web environment because of the popularity and ubiquity of the Web 
itself and the nature of its features: it provides an information representation that 
supports interlinking of all kinds of content, easy access for end-users, and easy 
content creation using widely available tools. 

In most cases, however, the development approach used for Web-based systems 
has been ad hoc, and many Web-based systems have been kept running through a 
continual stream of patches. Overall, Web-based systems development has lacked 
rigour, systematic approach, and quality control and assurance. As the complexity 
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and sophistication of Web-based applications grow, there is now legitimate and 
growing concern about the manner in which they are created and their quality and 
integrity. 

In the absence of disciplined process for developing Web-based systems, we may 
face serious problems in successful development, deployment, operation and 
'maintenance' of these systems. Poorly developed Web-based applications that are 
escalating now have a high probability of failure. Worse, as Web-based systems grow 
more complex, a failure in one system or function can and will propagate broad-based 
problems across many systems and/or functions. When this happens, confidence in 
the Web may be shaken irreparably, which may cause a Web crisis [1]. The potential 
Web crisis could be more serious and widespread than the software crisis, which the 
software developers have been facing [2] . 

In order to avoid a possible Web crisis and achieve greater success in development 
and applications of complex Web-based systems, there is a pressing need for 
disciplined approaches and new methods and tools for development, deployment and 
evaluation of Web-based systems. Such approaches and techniques must take into 
account: 1) the unique features of the Web, 2) operational environments of Web- 
based systems, 3) scenarios and multiplicity of user profiles, and 4) diverse type (and 
skills and knowledge) of the people involved in building Web-based systems. These 
pose additional challenges to Web-based application development. 

Motivated by the concern among some Web-based systems developers (including 
the authors) about the chaotic way in which most Web-based systems are developed, 
a few new initiatives were undertaken to address the problems of Web-based systems 
development and bring the potential chaos under control, and to facilitate successful 
Web-based systems development [3-7]. These initiatives have promoted Web 
engineering as a discipline. 

Web Engineering is concerned with establishment and use of sound scientific, 
engineering and management principles and disciplined and systematic approaches to 
the successful development, deployment and maintenance of high quality Web-based 
systems and applications. 

It incorporates some of the well-known and successful traditional software 
‘engineering’ principles and practices, adopting them to more open and flexible 
nature of the Web, and the type of Web application. It also takes into consideration 
other elements that are specific to the Web environment. 

We organised the first workshop on Web Engineering in 1998 [3] in conjunction 
the World Wide Web Conference (WWW7) in Brisbane, Australia, to address the 
state of Web-based systems development and to promote Web engineering 
approaches. Building on the success and outcome of the first workshop [3], two more 
workshops on Web engineering were organised in 1999 [5, 6] to review practices in 
Web-based systems development and the progress in this area, and to pave directions 
for further study. The IEEE Software magazine [4] presented an interesting 
roundtable discussion on “Can Internet-Based Applications be Engineered?” Also a 
few Web engineering related articles [7-14] were published. These invoked a growing 
interest in Web Engineering - a new discipline and approach for successful Web- 
based systems development. 

This paper gives an introductory overview on Web Engineering in order to 
promote this new discipline among Web-based systems developers, researchers, 
academics and students. 
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The paper assesses the problems of Web-based systems development as is 
currently practiced and argues the need for adopting Web Engineering approaches for 
developing scalable, quality, large-scale Web-based systems. It outlines the principles 
and roles of Web Engineering and also assesses the similarities and differences 
between development of traditional software and Web-based systems, and between 
software engineering and Web engineering. The paper consequently identifies key 
Web engineering activities and highlights approaches and methods for systematic 
development of Web-based applications reviewing ongoing work in this area. Einally, 
the paper discusses the prospects of Web engineering and highlights the areas that 
need further study and development. 

[Eor an updated list of some useful resources, such books, articles, workshop 
proceedings and Web sites, on Web engineering see pages 363-365]. 

2 Ad Hoc Approaches and Concerns 

The Web has evolved very rapidly into a global environment for delivering all kinds 
of applications, ranging from small-scale, short-lived services to large-scale enterprise 
applications widely distributed across the Internet and corporate intranets. Tracking 
the Internet’s global diffusion [15], and its influences and impact on society at large is 
a daunting task, perhaps an impossible task. According to an estimate [15], 
commercial use accounts for 58% of Internet traffic, far exceeding the networks’ 
originally intended application in defense and research and development [16]. 

Development approaches used for Web-based systems have been ad hoc [3-14,17]. 
Hardly any attention was given to development methodologies, measurement and 
evaluation techniques, application quality and project management. Eurther, most 
current applications development and management practices heavily rely on the 
knowledge and experience of individual developers and their own development 
practices. In addition, they lack proper testing of Web-based systems, and 
documentation which is needed for ‘maintenance and upgrade’ of the systems among 
other needs. 

Problems of Web-based systems development can partly be attributed to the nature 
and rapid growth and evolution of the Web, the boom in Web and Web-related 
technologies, the commercialisation of the Web, the rush to “be on the Web” and the 
desire (or need) to migrate the legacy systems to Web environments. Also the 
complexity of Web-based applications has grown significantly - from information 
dissemination (consisting of simple text and images to image maps, forms, CGI, 
applets, scripts and stylesheets) to online transactions, enterprise-wide planning and 
scheduling systems and Web-based collaborative work environments. The complexity 
of Web-based systems, however, is generally underestimated. 

Web’s legacy as an information medium rather than an application medium is 
another cause of the problem. Many developers, clients and managers, as well as 
academics still consider Web development primarily as an authoring activity rather 
than an application development to which some of the well-known software 
engineering and management principles and practices could apply - of course with 
some changes and fine tuning to suit to the Web environment. Web-based systems 
development is a process - “it is more than media manipulation and presentation 
creations - it includes analysis of needs, design, management, metrics, maintenance, 
etc [11]”. 
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Many attributes of quality Web-based systems such as ease of navigation, 
accessibility, scalability, maintainability, usability, compatibility and interoperability, 
security, readability, and reliability are often not given due consideration during 
development. Many developers seem to be unaware of the real issues and challenges 
facing major Web-based application development and its continual maintenance. 

There is now legitimate and growing concern about the ad hoc manner in which 
most Web-based systems are currently created and their long-term quality and 
integrity. Further, more sophistication and increased complexity of some Web-based 
applications bring in many new challenges that need to be satisfactorily addressed. 

To address these concerns and challenges, first we need to create an awareness of 
the need for more disciplined approaches to Web-based application development and 
also move from the current, largely ad hoc (and personalised) approach to a 
disciplined approach and process. Importantly, we also need to realise that Web- 
based systems development is not just graphic design or content development any 
more; there are growing number of complex applications - intranet-based 
applications, transactional systems, and other e-business applications. “There is more 
to Web site than visual design and user interface. Web sites are becoming more like 
programmes, less like static documents” [9]. Hence Web-based systems 
developments are becoming more like major [software/IT] projects, and less like 
work of art. 



3 Web Engineering: The Need and Principles 

In the absence of a disciplined approach to Web-based systems development, we will 
find sooner or later that: 

a) Web-based applieations are not delivering desired performanee and quality. 

b) Web applieation development proeess beeomes inereasingly eomplex and 
diffieult to manage and refine and also expensive and grossly behind 
sehedule. 

Web Engineering, an emerging new discipline, advocates a process and a 
systematic approach to development of high quality Internet- and Web-based systems. 
We provide a broad and objective definition of Web engineering as follows. 

Web engineering is the establishment and use of sound scientific, engineering and 
management principles and disciplined and systematic approaches to the successful 
development, deployment and maintenance of high quality Web-based systems and 
applications. 

Web engineering principles and approaches can bring the potential chaos in Web- 
based systems development under control, minimise risks, and enhance 
maintainability and quality. 



3.1 Web Engineering and Gardening Metaphor 

Many Web-based systems call for continual update or refinement, and hence Web- 
based systems development may be considered “continuous, with fine grained 
evolution, without specific releases as with software.” Thus, Web-based systems 
development is like gardening [8, 18]. Like a garden, a Web-based system will 
continue to evolve, change and grow. Hence, a good initial infrastructure is required 
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to allow the growth to occur in a controlled, but flexible and consistent manner, and to 
foster creativity, refinement and change. 

The gardening metaphor for Web application development raises the question 
about the appropriateness of engineering approach. We believe Web engineering is 
appropriate for Web application development, and, in support of this view, cite the 
relationship between horticultural engineering and gardening. Engineering principles 
and approaches can be adapted to Web environment to provide required flexibility to 
work within a framework and allow creative development. They are not as ‘rigid’ as 
perceived by some based on their perception of ‘traditional engineering’ approaches, 
and allow creativity and personalisation to blossom within a framework. In fact, all 
that Web engineering advocates is “use of sound scientific, engineering and 
management principles and disciplined and systematic approaches to the successful 
development, deployment and maintenance of high quality Web-based systems and 
applications.’’ 



3.2 Web Engineering and Software Engineering 

Though Web engineering involves some programming and software development, 

and adopts some of the principles of the software engineering, Web-based systems 

development is different from software development, and also Web engineering is 

different from software engineering. 

1. Most Web-based systems, at least as of now, are document-oriented containing 
static or dynamic Web pages. 

2. Web-based systems will continue to be focussed on look and feel, favouring 
visual creativity and incorporation of multimedia (in varying degrees) in 
presentation and interface. More emphasis will be placed on visual creativity and 
presentation in front-end user interfaces. 

3. Most Web-based systems will continue to be content-driven; often Web-based 
systems development includes development of the content presented. 

4. Most Web-based systems need to cater to users with diverse skills and capability, 
complicating human-computer interaction, user interface and information 
presentation to a multiplicity of user profiles. 

5. The nature and characteristics of the Web as an application medium as well as a 
delivery medium is not yet well understood. 

6. The Web exemplifies a greater bond between art and science than generally 
encountered in software development. 

7. Most Web-based systems need to be developed within a short time, making it 
difficult to apply the same level of formal planning and testing as used in 
software development. 

8. Web is different from software as related to the delivery medium. Traditional 
software generally operates in a well-defined environment whereas the Web- 
based systems, at the user end, have to cater to diverse environments. 

9. The type of individuals who build/develop Web-based systems are vastly varied 
in their background, skills, knowledge and systems understanding, and as well as 
their perception of Web and quality Web-based system. 
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3.3 Web Engineering: A Multidisciplinary Field 

As Powell [9] writes, Web-based systems “involve a mixture between print 
publishing and software development, between marketing and computing, between 
internal communications and external relations, and between art and technology.” 
Because of the nature and characteristics of Web-based applications and their 
development, Web engineering needs to be a multidisciplinary field, encompassing 
inputs from diverse areas such as human-computer interaction, user interface, systems 
analysis and design, software engineering, requirements engineering, hypermedia 
engineering, information structures, testing, modeling and simulation and project 
management, as well as social sciences, arts and graphic design (Figure 1). 



Software 

Engineering 



Multimedia 



Human-Computer 

Interaction 



Testing 

Project 

Management 



Hypertext 




Modeling and 
Simulation 



Requirements 
Engineering 

System Analysis 
and Design 



Figure 1. Web Engineering - a multidisciplinary field 



3.4 Web Engineering Activities 

Web development is a process, not simply a one-off event. Thus, Web Engineering 
deals with all aspects of Web-based systems development, starting from conception 
and development to implementation, performance evaluation, and continual 
maintenance. 

Major Web Engineering activities include: 

■ Requirements specification and analysis 

■ Web-based systems development methodologies and techniques 

■ Migration of legacy systems to Web environments 

■ Web-based real-time applications development 

■ Testing, verification and validation 

■ Quality assessment, control and assurance 

■ Configuration and project management 

■ "Web metrics" - generating metrics for estimation of development efforts 

■ Performance specification and evaluation 

■ Update and maintenance 

■ Development models, teams, staffing 

■ Integration with legacy systems 
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■ Human and cultural aspects 

■ User-centric development, user modeling and user involvement and 
feedback 

■ End-user application development 

■ Education and training 



4 Web-Based Systems Development 

While Web engineering activities span the entire Web lifecycle from conception of an 
application to development and deployment, and continual refinement and 
update/upgrade systems, in the following we highlight some of the [early] 
developments in this area, presented in this volume as well as elsewhere. This is, 
however, not intended to be an extensive survey or critical review of these 
developments. 



4.1 Web Development Process Models 

To better mange Web-based systems design and development, and to do it in a 
systematic and repeatable manner, we need a process that outlines the various phases 
of Web-based systems development. Some aspects that make Web-systems 
development difficult include complexity, changeability, invisibility and unrealistic, 
narrow schedules [10]. A process model should help developers “to address the 
complexities of Web-based systems, minimise risks of development, deal with 
likelihood of change, and deliver the site quickly, while providing feedback for 
management as the project goes along” [10]. Eurther, the progress of Web-based 
development should be monitorable and trackable. The process besides being easy to 
apply should facilitate continual update/refinement and evolution, based on feedback 
from users/clients. Eor information on some of the hypermediaAVeb development 
process models see [9-14]. An object-oriented model for the Web application 
development process, which uses XML technology to support modularity and reuse of 
Web document, is described in [19]. 



4.2 Analysis and Web Design 

Requirement analysis and Web-based systems design are very important activities and 
call for a systematic and disciplined approach. Web systems design considerations 
and approaches are discussed in [9, 20-23]. 

Object Orientation in Web-Based Systems. Integration of Web and object 
technologies offers foundation for expanding the Web to a new generation of 
applications. According to Manolo [24], Web must improve its data structuring 
capabilities, and integrate aspects of object technology with the basic infrastructure of 
the Web. He argues that if the Web is to support complex enterprise applications, it 
must support generic capabilities similar to those provided by the OMA (object 
management architecture), but adapted to the more open, flexible nature of the Web 
and to the specific requirements of Web applications. Technologies for Web object 
model are described in [24]. 
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Usability and User-Centered Designs. Effective Web site design requires 
consideration of usability. Web-based systems need to be designed for easy 
navigation, and also they need to be attractive and useful [25]. User-centered design 
methods for Web sites are presented in [26], while [27] presents a user-centric 
approach to modeling Web information systems. 



4.3 Testing of Web-Based Systems 

Testing, verification and validation (V & V) of Web-based systems is an important 
and challenging task. But, it receives very little attention by Web developers. Web- 
based systems testing differs from conventional software testing and poses new 
challenges. A Web-based system needs to be tested not only to check and verify 
whether it does what it is designed to do but also to evaluate how well it performs in 
(different) Web client environments. Importantly, they need to be tested for security 
and also for usability from the users’ perspective. However, the unpredictability of the 
Internet and Web medium makes testing of Web based systems difficult. We need to 
develop new approaches and techniques for testing and evaluation of complex Web- 
based systems. For a brief overview on Web systems testing see Chapter 8 in [9] 
and [28-30]. 



4.4 Management of Large Web Sites 

Management of large Web sites is a difficult task, especially in the midst of change, 
which is a fact of life in the Web environment. Requirements for management of large 
Web sites, and the tools and a mechanism for organising and manipulating large Web 
sites are described in [31]. 

Web Configuration Management. Web-based systems undergo changes, perhaps 
more often and quite extensively, in their development and operational period. The 
changes may include trivial to large-scale change of information/data and 
requirements. These changes need to be handled in a rational, controlled manner. 
Web configuration management (WCM) encompasses a set of activities for 
controlling and facilitating change: identification, version control, change control, 
auditing and reporting. It also provides a framework for handling change in a rational, 
controlled manner. It could adopt commonly practiced software configuration 
management (SCM) concepts, principles and approaches to the Web environment. 
Dart [32] discusses how software configuration management techniques and practices 
could be used for WCM and to contain the Web Crisis. 



4.5 Skills Hierarchy 

Large Web-based systems development requires a team of people with different 
skills, knowledge and capabilities. A categorisation of skills and knowledge-base 
hierarchy for participants in Web-based systems development is provided in [33] and 
also in this volume (pp 228-241). 




Web Engineering: A New Discipline for Development of Web-Based Systems 1 1 



4.6 Barriers to Web Technology Adoption 

Nambisan and Wang [34] identify three levels of adoption of Web technology - 
Level 1: information access, level 2: work collaboration, and Level 3: core business 
transaction. They also identify three key areas of potential knowledge barriers to Web 
technology adoption: technology-related knowledge barriers, project related 

knowledge barriers, application related knowledge barriers. 



5 Areas of Further Study 

Web engineering discipline is very young and has just started gaining attention of 
researchers, developers, academics, and other major players in Web-based systems 
implementation such as customers/clients and their contract administrators. It needs to 
evolve and mature to effectively handle the new, unique challenges posed by Web- 
based systems and applications. We need to study and evaluate current approaches 
and practices, and develop new methods and techniques to address the challenges of 
developing large-scale Web-based systems. The areas that need further study include 
(not in any specific order): 

• Requirement analysis and systems design 

• Information modeling 

• Process and product models 

• Testing, verification and validation 

• Performance measures 

• Web metrics 

• Configuration and project management 

• User interface, ease of use 

• User-centric design, end-user development/personalisation 

• Quality control and assurance 

• Education and training 



6 Prospects of Web Engineering 

As we improve our ability to build Web-based systems, the systems we need to build 
are likely to get more complex. The quality requirements and features of these 
systems may also change with more emphasis on performance, correctness and 
availability of Web-based systems, as we will increasingly depend on Web-based 
systems in a number of critical applications, where the consequences and impact of 
errors and failures could be serious. Further, as systems become larger, a large team 
of developers with different types and levels of skills would be required, necessitating 
distributed collaborative development. As we move further in cyberspace and try to 
exploit some of the unrealised potentials of the Internet and Web, there will be many 
new challenges and problems. Hopefully, new approaches and directions would be 
developed to meet the challenges and solve the problems we may face on our mission 
to build a better cyberspace for us. 
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Successfully convincing developers of Web applications about the need for and the 
benefits of Web engineering approaches (which if implemented thoughtfully) will go 
a long way to reduce the complexity and lead to successful development. 

Like the Web, which is dynamic and open, Web engineering needs to evolve 
rapidly, adapting to the changes, responding to the needs, shifting the emphasis as 
needed and following new paths. 
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Abstract. With the advent of the World Wide Web, ‘computing’ has 
gone beyond the traditional computer science, information systems and 
software engineering. The Web has brought computing to far more 
people than computing professionals ever dealt with and led to 
mushrooming growth of Web-based applications. Implicitly, 
computing professionals are no longer the privileged intermediaries 
between computers and other people, as end-users and the technological 
advances take their toll. On the other hand, the new applications must 
still be developed in disciplined ways. Engineering embodies such 
disciplined methods. While the generic term engineering, meaning a 
systematic application of scientific knowledge in creating and building 
cost-effective solutions to practical problems, is integral to many 
disciplines, the term Web Engineering per se may not be widely 
understood or accepted at this stage. This paper elaborates on the 
concept of Web Engineering, relates it to computer science, software 
engineering and information systems, draws upon past experiences in 
software development and critically analyses it from the point of view 
of computing professionals who are not themselves engineers. 

Keywords: Web engineering, Web-based information systems, Web 
design, information management, Web-based applications, end-user 
computing, engineering, information systems 



1 Introduction 

With the advent of the World Wide Web, ‘computing’ has transcended the traditional 
computer science (CS), information systems (IS) and software engineering (SE) in the 
sense that these three fields were strictly within the domain of a computing 
professional whereas the Web reaches the "world". The spread of computing in every 
area of human endeavour had already started to marginalise the influence of 
computing professionals on how computers should be used. The Web has brought 
computing to, and involves a far greater number of people than computing 
professionals ever dealt with until now. Erom being a very privileged minority 
mediating between very advanced technology and the general populace, the 
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computing community has become, without any explicit acknowledgment, servants of 
both technology and the populace. 

The attraction of the Web and easy availability of development tools have also led 
to a mushrooming of Web sites and applications created by end-users, rather than by 
computing professionals. The growth in end-user computing, however, causes 
concern about the quality and reliability of Web applications [1, 2]. Interestingly, but 
not surprisingly, Web-based developments have aroused similar concern from other 
non-computing disciplines, such as graphic design and hypertext, which argue that the 
Web site and application developers are disregarding the traditions and proven 
techniques of those disciplines (see, for example, [3-5] for the concerns of graphic 
designers). 

Web-based applications are meant for a wider user base than traditional 
applications, whether within an organisation (intranets), across a number of 
organisations (extranets) or over the internet. Web-based application developers, 
however, do not necessarily know all the users [6-8]. Further, they must deal with a 
fast-changing technology and evolving standards, be conscious of the legal and 
security issues [9], and, at the same time, create aesthetically pleasing Web sites and 
pages. We are only beginning to realise the implications of this development; people 
and professionals from all walks of life are now trying to make sense of and 
communicate their ideas about what it all means and where it might lead with the 
‘generation net’ soon to become dominant [10]. The computing community as a 
whole is still busy battling, as ever, the advancing technology scant realising that the 
field has widened, there are far more non-computing people involved and that they 
must change their tactics to accommodate both the technology and the rest of 
humanity. 

One way of dealing with new developments and shaping them is to use existing 
metaphors and classifications. If similarities can be discovered then known control 
mechanisms or their modifications can be utilised; if dissimilarities are identified then 
new methods can be devised to derive the maximum benefits or minimise risks. 
Engineering is one such metaphor which, being familiar and generally trusted, is 
frequently and optimistically used to signify an orderly development in a new field. 
Some of the latest additions to engineering have appeared in the form of hypermedia 
engineering [2], Web site engineering [11], knowledge engineering, and document 
engineering. While the generic term engineering, meaning a systematic application 
of scientific knowledge in creating and building cost-effective solutions to practical 
problems [12], is integral to many disciplines, its use in the context of Web-based 
application development needs to be widely discussed [13, 14]. Given the frequent 
usage of the term engineering, as listed above, in addition to software engineering, 
computer engineering and systems engineering, we have to establish the need for the 
engineering approach to Web-based application development and what constitutes 
Web engineering. 

This paper advances the arguments and adds to the details in [1] and further argues 
that we need to understand: a) the evolutionary place of Web engineering to learn 
from the past, and b) also how it may be seen especially by non-engineers (within the 
field of computing) who tend to see engineering as a good discipline and yet 
somewhat limited in its scope and not universally applicable. 

Our first major experience of the Web was significantly instructive in the light of 
the remarks made above. Having come from the mainstream computing background, 
we found the Web an entirely amazing experience while the IT service personnel, at 
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the same demonstration, were profoundly upset by it. With hindsight, we now 
understand that they saw it as a thoroughly unsettling experience, beyond their current 
competence and budget while we sought new and challenging territories, ready for 
exploration. Fortuitously, by its timing, this first exposure to the Web made us aware, 
right at the beginning, of the possible multidimensional nature (technical, 
administrative, and managerial, in addition to the research and academic dimensions) 
of the likely effects of new technology, especially the Web. Consequently, when we 
formed a research group to carry out work in Web-based systems, the membership of 
the Group was derived consciously from academic, administrative and end-user 
background. We also actively developed Web sites and applications as well as 
participated at the policy making levels. 

The overall thrust of our work and discussions led us to classify some of our work 
as ‘Web Engineering’ which found an expression in the first Web Engineering 
Workshop at the seventh World Wide Web Conference (WWW7) in Brisbane in 1998 
[13]. Subsequently, we co-sponsored another workshop, at the eighth World Wide 
Web Conference in Toronto in 1999 [15] and organised three more workshops, one at 
WWW9 in Amsterdam in 2000 and two at the International Conference on Software 
Engineering, Los Angeles in 1999 and Limerick, Ireland in 2000 [16]. A sixth 
workshop is scheduled for WWW 10 in Hong Kong in 2001. 

The Group members come from several academic disciplines, including computer 
science, operational research, business computing, electronics engineering and 
physics. There is a healthy debate within the group about differing view points 
including, in this context, both engineering and non-engineering. This paper 
articulates an evolutionary and non-engineering point of view of Web Engineering, 
discussing its strengths and weaknesses, and speculating about its prospects. 
Murugesan et al [1] elaborate on the concepts behind Web Engineering in detail. The 
reader is urged to read both papers for a greater appreciation of this new field. 

The paper is organised as follows. Section 2 briefly describes the Group’s focus 
and activities. Section 3 places Web-based application development in the 
evolutionary context of information technology (IT). Section 4 discusses what is 
generally meant by ‘engineering’, how it is perceived within the computing sector and 
how it may apply to the Web-based application development. Section 5 identifies the 
issues that contribute to Web engineering. Section 6 speculates about the near-term 
developments in Web engineering and section 7 concludes the paper. 



2 Background to Web Engineering 

The Web-based Information Systems and Methodologies (WebISM) Research Group 
at the University of Western Sydney Macarthur, Australia, was formally brought 
together in early 1997. Informally, the Group members had been attracted to and 
started Web-related work in late 1995. From the beginning, our emphasis was to 
engage in practical work and learn from it. Two members became the Faculty Web 
masters. The early experience highlighted issues of static versus dynamic pages, 
information structuring, reuse of existing documents and information, and rather 
surprisingly, of who ‘owned’ information and was responsible for its dissemination. 




Web Engineering: Beyond CS, IS and SE 17 



The issues of ownership and responsibility for Web sites and applications need to 
be explained briefly because they symbolise how Web-based developments transcend 
traditional boundaries of computing. 

Among the first pages to be put up on the Faculty site were those giving details of 
degree courses and subjects taught. As academics delivering some of the subjects 
ourselves, we saw nothing exceptional in these pages, especially since the material 
was taken from documents already in the public domain. The Office of Development 
and External Relations (ODER) at the University, however, raised strong objections. 
As the department responsible for the public face of the University, they considered 
the university calendar and details contained in it as belonging to them and hence not 
to be used without their permission. 

The dispute was settled fairly quickly and amicably and the Faculty Web pages 
were allowed to exist being deemed as within the purview of the faculty as well as 
ODER. The episode, however, brought home the fact that Web-based activities 
necessarily have a much broader context than simply another form of computing 
development. There are some easily identifiable and purely technical issues such as 
network computing and performance, distributed computing and databases, new 
standards and tools relating to the Web, and new programming languages. Apart 
from all these, however, Web site and application development also require 
understanding of and skills in: site management and security, graphic design, 
document and link management, legal and copyright issues, academic freedom and 
privacy matters and, especially, users and evolving methods to deal with them. 

In order to discuss these aspects and shape the future work, it was necessary to find 
a term which described the area much more cogently to others. While the Web is 
“just another application of distributed computing” to computer scientists, Web-based 
application development involves much more than has been traditionally ascribed to 
general computing. One compromise, after much debate, was to call the field ‘Web 
Engineering’. Part of this debate and the ensuing understanding are described in 
Sections 3 and 4. Section 3 puts the Web-based developmental activities in an 
evolutionary context, while Section 4 discusses the engineering rubric. 



3 Evolutionary Context of Web Engineering 

The evolution of computing over the last fifty years and its continuing progression are 
well documented and commented upon by many people (for a stimulating treatment 
of this topic, see [17]). This section covers neither the familiar ground of how 
computing professionals moved from a focus predominantly on programming to 
computer science on to information systems nor the scale of change in hardware and 
software platforms over this period. 

The section highlights three main aspects of the evolution, in the context of Web 
engineering: 

1 . the effects of the spread of IT and end-user computing 

2. the relatively low adoption of methodologies in developing and delivering 
good quality of IT applications 

3. technologies associated with the continuing development of the Web. 
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The more visible and significant aspect of the evolution is the ubiquity of 
computing; a large number of non-IT people now use IT extensively and know a lot 
about it. The rise in end-user applications and decision support systems is a recent 
phenomenon flowing from the dependence on IT to store and process data and 
information. This contrasts with the first three and half decades or so of IT evolution, 
when IT had a more exclusive aura about it even as it continued to spawn different 
specialisations within the general computing field. This exclusivity was dented by 
desktop publishing and personal computers (PC), especially because these 
developments were very different from what the IT professionals were used to and 
because they could not be controlled like the mainframe and the miniframe 
computers. The mass participation in IT use and application generation made feasible 
by the PC revolution did lead to concerns about the quality of applications developed 
by non-IT people. However, IT professionals had more pressing concerns with the 
backlog of applications and the constant changes in technology. The net effect was 
that the end-users did not benefit from the cumulative experience of IT professionals 
as much as they could have and the IT professionals found themselves somewhat 
marginalised. There were other significant contributory reasons as well, such as IT 
not aligning itself to the organisation’s goals and objectives and IT projects far 
exceeding development schedules and budgets. Nevertheless, it is arguable that end- 
user development took away the mystique and the protective veil of the application 
developers without improving the quality and performance of new applications. 

The second aspect of the evolution is how different systems development 
methodologies have been proposed and used over the last three decades. There are 
many methodologies and techniques in vogue, such as Structured Systems Analysis 
and Design, Merise, and 00 Analysis and Design, based on both academic research 
and practical experience. For an early proposal on development methodology, see 
[18]; [19] describes and discusses many methodologies. However, practical use of 
these development methodologies and techniques in real world IT projects has been 
haphazard to non-existent (for early criticisms, see, [20], [21] and for a specific 
survey, see [22]). As a result, the problem of software quality and reliability 
continues to haunt the IT community. This scenario and past experience raise a 
significant question: if the recommended practices have not been taken up at the 
desired levels by IT professionals in the past, to what extent will they be taken up now 
by the end-users who develop Web and IT applications? 

The third aspect of the evolution concerns the Web itself. The mass participation 
in application development which started with the PC has only increased with the 
Web. Whereas the PC essentially catered to an individual, the Web makes it possible 
for everyone to reach the outside world. This reach is facilitated by a fast-moving 
technology, seemingly easy to grasp but not well understood, and used in a way 
which could fall foul of national and international laws. Whoever creates Web-based 
applications now, whether IT personnel or end-users, not only must understand the 
new technologies of the Web, but also must understand the basics of disciplines 
which they did not need to in the past, such as graphic design, document and link 
management, legal and social obligations, copyright and intellectual property rights, 
multimedia and hypermedia. 

The nature of computing in the era of the World Wide Web has changed and IT 
professionals have to face up to the task of how to deal with it. Web Engineering is 
an attempt at identifying what are the fundamentals in this new paradigm and what is 
peripheral. 
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4 Web Engineering 

‘Engineering’ is a fairly well understood term even though individual disciplines 
within the generic term debate from time to time as to what constitutes their own 
specialisations. The Macquarie dictionary defines it as “the art or science of making 
practical application of the knowledge of pure sciences such as physics, chemistry, 
biology, etc”. The New Oxford Illustrated Dictionary, while describing specific 
branches of engineering, such as civil, mechanical, electrical and military, helpfully 
adds, “engineering requires wide-ranging technical knowledge that enables the 
engineer to communicate and work with other specialists” (emphasis added). 

Berry [12] discussing the term in the context of software engineering says, 
“engineering is about the systematic application of scientific knowledge in creating 
and building cost-effective solutions to practical problems” 

The application of ‘engineering’ to software is still being debated (see, for 
example, [23] for a report on a Usenet discussion on this topic, and [24]). However, 
software engineering has been defined as “the systematic application of methods, 
tools and technical concepts to create complex, software-intensive systems that meet 
technical, economic and social objectives” [25]. It is interesting to note that the 
authors go on to suggest that software engineering “involves a number of different 
disciplines... computer science, management, psychology, design and economics. 
Software Engineering, or the research aimed at improving it, also covers a wide range 
of activities, from basic research that will not be applied for years, to the assessment 
of past development projects from which we hope to learn, to the application of 
techniques for future projects.” 

IEEE Standard Glossary of Software Engineering Terminology [26] defines 
software engineering as “(1) the application of a systematic, disciplined, quantifiable 
approach to development, operation, and maintenance of software; that is the 
application of engineering to software” and “(2) The study of approaches as in (1)”. 

Taking cue from the above, it can be said that Web Engineering is the application 
of a systematic, disciplined, quantifiable approach to development, operation, and 
maintenance of Web-based application development or the application of engineering 
to Web-based software. 

Eollowing the above analysis and discussions, there are three major points worth 
noting, some of which arise from a non-engineering perspective. 

1 . The first is that the concept of software engineering still arouses debate which 
ranges from the philosophical [24] in the sense of “can software be 
engineered” to practical, as in licensing software engineers [27]. 

2. The second point is that to many non-engineers, ‘engineering’ implies 
somewhat rigid but well formed and well-known methods to do something 
specific, implying less creativity and freedom of action (one only needs to ask 
an architect what s/he thinks of the engineers to get some interesting thoughts 
on the subject!). Such a view leads to scepticism about the appropriateness of 
the engineering discipline to Web-based application development which is in 
total flux. 

3. The third point is quite simply about the desirability of reassuring everyone 
about the seriousness of purpose in proposing an ‘engineering’ discipline. 
Almost everyone trusts the word ‘engineering’ to mean a reliable, responsible 
and professional approach in one’s work. Eollowing the earlier comments 
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about the marginalisation of computing professionals, an explicit ‘engineering’ 
outlook is likely to reassure the general public that the quality of software 
(Web-based applications) will be good. However, this has to be constantly 
monitored against what is in fact delivered. The positive side of evolutionary 
context here is that Weh-based application development is still new and there 
is a chance that we may be able to foster good practices over a wide spectrum. 

Web Engineering thus identifies a rapidly growing field with a much more forward 
looking approach rather than imply a collection of already existing and proven 
development practices. Web engineering will necessarily involve proposing, 
developing and testing process models suitable in this relatively new environment. 



5 Disciplines Contributing to Web Engineering 

Throughout the paper so far, various disciplines which affect Web-based development 
have been mentioned in the context of specific discussions. The following list is 
simply an enumeration of them without further, repetitive descriptions and is not 
exhaustive. It cannot be overemphasised that this list must be placed in the overall 
context of a good process model and as such it should be read in conjunction with [1] 
as mentioned before. 

The Web engineers must understand and keep up to date with the following: 

• the Web technology, its evolving standards and protocols and tools 

• Weh site management 

• graphic design 

• document and link management 

• information structuring and hypertext 

• legal and social obligations 

• copyright and intellectual property rights 

• multimedia and hypermedia 

• network and Weh performance 

• distributed computing and databases 

• security 

• user profiles and behaviour 

• evolutionary systems development 

• end-user computing 

The Web engineers must also be prepared to be surprised by, and yet undertake, 
completely new types of applications which transcend geographical boundaries very 
easily. 



6 Likely Near-Future Developments 

The following few scenarios are just the tip of an iceberg and are more speculative 
than predictive in nature. 




Web Engineering: Beyond CS, IS and SE 21 



Engineering is about systematic, disciplined and quantifiable approach. Among 
the hallmarks of this approach is measurability and repeatability of work. Software 
engineers and other IT professionals lament the fact that software industry is not 
strong on either. Measurements are scarce and repeatability is exercised more by 
experience and intuition. If we draw the lessons from software engineering, they 
should at least include the essentiality of these attributes. There is tremendous 
opportunity for all IT professionals, academics and researchers to get things right in 
the arena of Web-based application development. 

User-centric approaches and methods to build applications have been gaining 
strength. The openness of the Web makes it feasible to get user feedback (and 
requirements) on-line as opposed to more laborious and expensive traditional 
methods, such as meetings, interviews, paper-based surveys and focus groups. The 
on-line methods have not been tried out yet in any great measure and could prove to 
be very interesting. It is also likely that users now will want a greater say in 
application development and usability testing at earlier stages of the life cycle. 

One of the characteristics of Weh-hased developments is the rapidity with which 
new applications are put together. This is inevitably going to lead to evolutionary 
approaches to developing such applications. There is a promise of good work coming 
out in this area. 

Weh-hased applications can now call upon resources and Weh-based services from 
all over the world and thus could lead to some very novel applications for users across 
the world. An illustrative case is the Environmental Defense Fund’s Web site and its 
‘scorecard’ application [28]. Such applications, and hence better use of informational 
sources already available, will only increase over time. 

One development which is already affecting the world is the so-called 
‘glohalisation’ . Where it will lead and what the role of the Web will be in this regard 
is far too speculative and uncertain to attempt here. It needs to be mentioned, 
however, just to remind ourselves that there are more forces at work than we daily 
have to comhat with. 

7 Conclusions 

Web Engineering is in its infancy but growing very fast, even if we do not agree on 
the nomenclature at this stage. As participants in Web-based application 
development, we need to take our bearings fairly frequently and identify issues and 
problems, both philosophical and practical. This paper has attempted, in tandem with 
others, to focus on some of them and tried to highlight some significant areas. 
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Abstract. Development of most large Web sites is not an event, but a 
process. Often it is a process without a well-defined ending point. In 
order to allocate resources and develop a Web site, we need to divide 
the overall process into a set of sub processes that are well defined and 
have measurable outcomes. 

To identify the required sub processes first we need to understand the 
broader issues and specific requirements of the stakeholders. This is 
known as the context analysis. Next we can bring in technologies and 
develop an overall architecture or a product model to solve technology 
related issues. 

Once we have a product model we can identify sub processes required 
to implement this product model. Also we need to have a set of sub 
processes to address non technical issues identified during the context 
analysis phase. These sub processes can be converted into a project 
plan by allocating resources and putting a time schedule. Based on the 
project plan, development activities can take place and when completed 
it moves into a maintenance phase. 

This paper describes how we used this systematic approach to develop a 
large maintainable Web site. 



1 Introduction 

There have been many arguments as to whether Web sites can be engineered or 
not [1]. Will an engineering approach allow for creativity and innovation? 

Rather than trying to argue one way or the other, in this paper I will share with you 
how we went about developing a large Web site and reflect on the experience gained 
and the process used. 



2 University of Western Sydney Macarthur (UWSM) Web Site 

I was involved in this project at the beginning of 1998 when UWSM, like many other 
universities had a Web site up and running. When developing this Web site, a lot of 
thought had gone into identifying who the users were, what information they needed 
and how this information was going to be presented in a way that was easy to find. 
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At this stage the UWSM Web site was also experiencing some problems. In 
addition to the main university Web site, some faculties and departments had created 
their own Web sites. Some of the original information had changed and evolved. 
Some new subjects and courses had been introduced and some others were removed. 

Some pages in the Web site had been edited to reflect these changes while others 
were overlooked. In some instances the information contained in the main university 
Web site did not match with information contained in some faculty and department 
Web pages. For example, on different Web pages the fees quoted for a given course 
were different. In some pages it was the old fees (sometimes 2 or 3 years old) and in 
some it was the proposed fees, which had not yet come into effect. 

Also maintaining the information in the Web site has become a very resource 
intensive and time-consuming task. 

Thus, in summary at this time the UWSM Web site was facing the following 
problems: 

• Information within this Web site was not consistent, 

• Information in this Web site and other university Web sites was not consistent, 

• Maintaining the Web site had become a very resource intensive and time- 
consuming task, 

• There was the need to put more information on to the Web site. As people looking 
after the Web site had no more resources to allocate for these tasks, adding new 
information was not happening. 

The UWSM Web site was also facing another problem, which was somewhat 
unique to this site because of the management structure of the University of Western 
Sydney (UWS). 

At the time UWS was a federation of three members, UWS Macarthur (UWSM), 
UWS Nepean (UWSN) and UWS Hawkesbury (UWSH). There was a high level 
need to have some form of a common branding in all three member Web sites to 
indicate that they all belong to one federated University. 



3 Redeveloping the UWSM Web Site 

In trying to re-develop the UWSM Web site we followed the following steps: 

• Identify the corporate requirements and develop an overall policy document, 

• Identify the problems specific to the UWSM Web site and identify possible 

solutions, 

• Develop an overall architecture for the Web site based on the proposed solution, 

• Develop a process model to implement the solution. 

3.1 Overall Policy Framework 

A committee consisting of representatives from all three members was formed to 
formulate the UWS wide Web policy. This addressed the following areas: 

• What is required in terms of corporate branding. 
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• Who is authorised to publish on the various Web sites. Policy specifies 
decentralised approach where people publishing the information has to take 
responsibility for the information, 

• Attributes of information that need to be adhered to such as accuracy, not 
abusive, need to be current etc. 

• Style guide, 

• Mechanism to develop a corporate branding, 

• Dispute resolution mechanism. 

3.2 Solutions to Problems Specific to the UWSM Web Site 

To address the problems faced by the existing UWSM Web site, the new Web site 

had to ineorporate the following features: 

• Way of adhering to a common branding specified by the overall policy 
framework and to easily change the look and feel if the branding changes. This 
requires creating Web pages dynamically using templates. 

• To make information within this Web site as well as across other UWSM Web 
sites consistent, all this information needs to come from a single source. Thus if 
the information changes, it has to be changed in one place only and that should 
then be reflected in all Web pages that display this information. 

• To minimise the difficulties associated with maintaining the Web site, there 
should be an easy and de-centralised process to change the information. 

• It should be easy to add new pages to the Web site as and when required. 



3.3 Overall Architecture 

We arrived at an overall architecture as follows: 

In order for information to be consistent it has to come from a single source. If this 
source is used by all Web sites, then we have consistent information across all Web 
sites. To meet this requirement we designed a corporate data repository for all 
business specific information and provided a mechanism for all UWSM Web sites to 
get the information from this data repository. 

The front ends to this corporate data repository are the various public Web sites, 
UWSM Web site being one of them. All these Web sites need to create their Web 
pages dynamically extracting information form the corporate data repository. 

Then we included an information management framework to manage the 
information in the corporate data repository and a back end system to implement this 
framework. This back end system enabled faculty staff to create information on 
subjects, courses, policies and staff details using a set of standard form interface. This 
information gets stored in various databases in various faculties. When the 
information is fully developed it is submitted to the appropriate committee for 
approval. Once approved, the secretary to that committee transfers information to the 
corporate data repository. 

We also had to have a mechanism to check whether the published Web pages 
adhere to the overall policy. The way we achieved this was by specifying a set of 
Metadata that has to be in every Web page and developing a Web robot that goes and 
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frequently checks these. If it detects that a page is about to expire, it sends an email 
to the author advising the author of the situation. The overall architecture is shown in 
Figure 1. 




Approval Process 
{Education Committee etc ) 



Figure 1 - Overall Architecture 



3.4 Process Model 

Once we developed the overall arehiteeture we were able to divide the project into a 
few different sub projects. Some of these projects are aimed at developing various 
eomponents for the overall Web site. We also identified some requirements 
associated with the need to enhance the skills base and to increase the awareness of 
the Web and its impact. Some of the sub projeets that we started were to address 
these issues. Detailed diseussion on this human aspect is given in [2]. Table 1 gives a 
summary of the various sub projects. 

We developed a set of specifieations for eaeh of these sub projects and were given 
to Champions to implement these. These projeets had defined starting points and 
ending points and a way of measuring the progress. 

Each project group developed a process for them to follow. These processes 
included activities that would ensure that the final outcome would meet the original 
specification. For example, the process used to develop the front end of the UWSM 
Web site (sub project 1) had activities such as design for scalability, design for 
navigation, design for maintenance etc. 
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Table 1 - Summary of sub-projects 



Project 


Details 


1 . Develop a 
new UWSM 
Web site 


Design of this Web site should take into account the 
following factors. 

1 . Design of information structure 

2. Maintenance and scalability 

3. Ease of finding information and navigation 

4. Appearance and presentation 

5. Web policy 

6. Functionality and interactivity 

7. Feedback from possible users 

It is important to document the process as this will be used as 
an example when developing faculty and division Web sites. 


2. Develop 
faculty and 
divisional 
Web sites 


To get interested/responsible people from faculties and 
divisions together to arrive at a consistent format. 


3. Increase 
general 
awareness 
among 
Senior 
Management 


This would include a presentation to Senior Executive 
Managers and asking Deans and Managers to nominate a 
Web Coordinator and a Technical Manager for their Web sites. 


4. Make 
necessary 
changes to 
HR policy. 


This would include making appropriate changes to current Job 
Description and to other relevant HR policies to accommodate 
Web-related developments. 


5. Develop 
Web-related 
skills among 
employees 


This would include organising appropriate training sessions, 
setting up of discussion groups and Web site containing 
resource material. 


6. Develop 
UWSM 
Information 
Management 
Infrastructure 


This project is to develop a UWSM wide information 
management system. This need to facilitate the workflow 
associated with developing subject and course outlines and 
enables transfer of this information into corporate data 
repository after being approved by relevant committees. 
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4 Current Status and Some Observations 

We have finished sub projects 1, 6 and 7. Other projects made varying degrees of 
progress. From November 1999 the whole University management structure started 
to change towards forming a unified University. Thus these activities were put on 
hold in April 2000. 

A paper titled “Blue Print for UWS Web Site and Information Management 
Framework” was submitted to the senior management based on the experience 
gained. Since the original design was done acknowledging that change was 
inevitable, it is not difficult to adopt the framework that we developed for the new 
university structure. 

When we first embarked on this project, everyone had different ideas as to how we 
should go about it. Different people involved in the project emphasised the need to 
pay attention to different aspects of the Web site, such as look and feel, ability to find 
information easily, cost, lack of time to carry out development and maintenance, 
problems with information inconsistency etc. 

Only after we put in place a process that was going to address each and every one 
of these issues were we able to make real progress. This immediately enabled the 
people to see how to divide and conquer the problem and how everything fitted 
together. 



5 Abstracting the Process 

If we abstract the process that we adopted, one can identify the following stages: 

1 . Understanding the environment, corporate requirements and stakeholders. 

2. Understanding specific major problems stakeholders have, that needs to be 
addressed. 

3. Developing an architecture for the overall Web site and solutions to non- 
technical issues. 

4. Developing sub projects or sub processes to implement the architecture and 
solutions to non-technical issues. 

5. Implementing these sub projects. 

If the sub projects are too complex to manage, the same process can be applied to 
these sub projects until we end up with a set of manageable tasks. 

The overall development process is shown in Figure 2. 

The first two steps that we carried out to understand the broader issues form part of 
the context analysis. This is the first step any Web site development project needs to 
undertake. 

Once you understand the broader issues we can start putting technologies together 
to address technology related issues. The “Product Model” represents the overall 
architecture of the Web site to be developed. 
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Figure 2 - Overall Development Process 



We then have to define the sub processes required to implement the architecture 
and to address the non-technical issues. This is known as the process model. 

These sub processes can be converted into a project plan by allocating resources 
and putting a time schedule. Based on the project plan, development activities can 
take place and when completed it moves into a maintenance phase. 

An important aspect is from time to time the environment, corporate requirements, 
stakeholders, information etc. can change. When this happens, depending on the 
nature and magnitude of the change, we will have to perform appropriate parts of 
some of the relevant sub projects. 



6 Web Engineering 

What enabled us to develop a new UWSM Web site within 9 months was that we 
adopted a process that had activities to address various concerns, issues and problems 
of stakeholders. A broader definition of engineering is the application of scientific 
and mathematical principles to practical ends such as the design, manufacturing, and 
operation of efficient and economical structures, and systems [3]. We used 
systematic approach incorporating known methodologies and technologies to solve a 
problem in a repeatable, measurable and cost effective manner. Thus one can argue 
we have engineered a Web site. Further information on how to engineer Web sites 
can be found in [4, 5, 6]. 
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7 Conclusion 

I would like to conclude the paper by summarising a few important points with regard 
to developing large maintainable Web sites. 

• We need an overall approach to manage the process of development and 
maintenance of the Web site. 

• Development of a Web site in many cases is not an event, but is a process. It will 
have a start, but will not have a predictable end. 

• Within this continuous process we need to identify various activities that will 
have a defined start and a finish. This is very important, as without these well 
defined activities, the overall process becomes unmanageable and, there is no 
way to allocate resources and measure the progress. 

• Often with changing purposes for having a Web site, requirements, information 
etc, we will have to repeat these activities to keep the Web site current. 

• Often we forget about the big picture and concentrate on a specific activity only 
to find out that at the end of the activity requirements or information have 
changed and we have to repeat the process all over again. If this changing nature 
of requirements, information, look and feel and all other aspects were considered 
within the activities that were carried out to develop the Web site, then one can 
build into the design efficient and cost effective ways of managing the change. 

Finally 1 would like to stress that without a well-defined systematic approach, it is 
impossible for a group of people to effectively work together on a large Web 
development project. 
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Web-Based Systems Development: Process and 
Methodology 



1 Overview 

The engineering approach places a strong emphasis on product and process 
modelling. The previous section defined and elaborated on what is Web Engineering 
and why it is needed. This logically leads to questions about product and process 
modelling or methodologies. How well do the currently available models meet the 
needs of Web Engineering? Do we need to create new models and why? Are there 
examples of good and bad practices that one can learn from? The seven papers in this 
section tackle these questions head-on and come up with some interesting 
observations and answers. 

The first paper, Corporate Web Development: From Process Infancy to Maturity - 
A Case Study, reports on the Web development in a university environment over a 
number of years. Several practitioners have commented on the ‘accidental’ starts of 
Web developmental activities in various organisations and the lack of a ‘road map’ 
for such work (see, for example, Challenger et al in section 5). The case study 
detailed here suggests that the Web development processes even within an 
organisation are likely to be at different maturity levels, if pursued by different units 
and independently. These maturity levels may be characterised as: a) ‘infancy’ (ad- 
hoc methods adopted to put the organisation/unit on the ‘Web map’), b) ‘early 
childhood’ (less ad-hoc methods and awareness of the fact that mere presence on the 
Web is not enough to attract the attention of the rest of the world, c) ‘growing up’ 
(evolutionary methods and development of Web-based applications) and d) 
‘adulthood’ (evolutionary methods and a corporate plan for systematic information 
structuring and development of Web-based applications and activities). The paper 
formulates and discusses the maturity levels in terms of focus, size, lifetime, 
development process, the developers themselves, the extent of prior analysis of user 
profiles, scalability and maintainability. 

The second paper. Applying Cross-Functional Evolutionary Methodologies, draws 
upon practical, industry-focused experience under the time-to-market pressures of the 
Internet industry to discuss the methodological issues before recommending a mix of 
rapid application development, evolutionary delivery and joint application 
development adapted to the unique climate of the Web. 

The paper suggests that Web development is different because the people who 
build Web sites are different; that Web development tasks are often done in parallel 
rather than sequentially; and that Web sites are usually not discrete systems, but rather 
a blended convergence of code, visual design elements and editorial content. A Web 
application development requires a great deal of communication and team synergy for 
success. The outcome is more collaboration with non-technical personnel during the 
implementation stage than what happens in a typical software project. Eurthermore, 
the Web environment is an immediate medium, unimpeded by the manufacturing, 
distribution and sales channel delays inherent in shrink-wrap software development 

The paper reports on data collected from 23 projects over a 16-month period, 
roughly equally divided into those which used the recommended methodology and 



S. Muragesan and Y. Deshpande (Eds.): WebEngineering 2000, LNCS 2016, pp. 33-35, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




34 



those that did not. While reporting a better performance for the former, the paper 
does underline the need for more detailed and quantitative studies before drawing any 
firm conclusions. 

The third paper, Development and Evolution of Web-Applications using the 
WebComposition Process Model, argues that from a software engineering perspective 
the World Wide Web is a new application platform. The implementation model that 
the Web is based on makes it difficult to apply classic process models to the 
development and, even more, the evolution of Web-applications. Component-based 
software development seems to be a promising approach for addressing key 
requirements of the very dynamic field of Web-application development and 
evolution, including re-use. However, such an approach requires dedicated support. 
The paper describes the WebComposition Process Model which introduces the 
concept of an open process model with an explicit support for reuse to develop 
component-based Web-applications. The WebComposition Process Model uses an 
XML-based markup language to seamlessly integrate with existing Web-standards. 
The process model addresses the need for a controlled evolution of Web applications 
through the use of domain-components to describe the application domains. 

The next paper, Engineering the Web for Multimedia, tackles the engineering 
issues in developing multimedia applications on the Web, with particular emphasis on 
digital video. The nature of digital video brings additional complexity to engineering 
solutions on the Web due to: a) the large data sizes in comparison with text, b) the 
temporal nature of video, c) proprietary data formats, and d) issues related to 
separation of functionality between content creation, content indexing with associated 
metadata, and content delivery. A case study of a system for searching and browsing 
of video and related material in a video-based Web application, CueVideo, is used to 
illustrate the issues, the different component technologies involved in deploying 
video-based Web applications, and the tradeoffs involved with each option. 

The fifth paper. Modelling Security Policies in Hypermedia and Web-based 
Applications, is on security, which is a key requirement of any multi-user application. 
Hypertext security models frequently address security issues at storage, access and 
privacy (unclassified, secret, top secret) levels. This paper tackles the issue from a 
specifically hypertext angle, i.e. text manipulation levels, viz. browsing, authoring, 
and usage, the last one being the ability to include new objects into the nodes to 
facilitate personalisation. 

There has been a lot of discussion of user-centric approaches in application 
development. Web-Based Information System Development : A User Centered 
Engineering Approach, examines currently available user-centric methods (OOHDM, 
RMM, W3DT, Araneus) that were originally designed for hypertext or hypermedia 
applications or are data driven or implementation oriented, but do not address 
specifically the usability problem, crucial in the Web environment. The WSDN 
method tackles the usability problem but is only suitable for the design of kiosk sites 
that mainly provide information and allow users to navigate through that information. 

The paper describes a six-step methodology which explicitly takes into account the 
informational and applicative goals of a Web application. It uses conceptual, logical 
and physical levels of abstraction and models the potential users and their behaviour. 
And also recommends inclusion of explicit navigational modelling. 

The last paper. Rapid Service Development: An Integral Approach to E-business 
Engineering, in this section on Process and Methodology, applies the engineering 
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approach to E-business. Developing e-business solutions is a complicated task. It 
involves many different disciplines and requires knowledge of e-business 
technologies as well as business processes. This paper examines the causes of the 
complexity, and discusses an approach to overcome the current barriers in e-business 
engineering. The approach combines knowledge of processes and technology with a 
new e-business engineering methodology, called Rapid Services Development (RSD). 
The components of RSD are discussed in detail, and linked to current engineering 
approaches. 
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Abstract. Web Engineering is about the use of systematic methods to 
develop Web sites and applications. The choice of a suitable 
development model, according to practitioners and researchers, is 
dependent upon many variables such as characteristics of the planned 
site (and applications), its document orientation, content and graphic 
design, budget and time constraints, and the changing technology. It is 
also generally assumed that a development model is selected at the 
beginning of a project. However, most of this discussion concentrates 
on Web related issues. This paper reports on a case study of Web 
development, which started as end user computing (ad hoc) activity and 
then developed into a conscious effort to formulate and promote a 
systematic approach. The evolution from process ‘infancy’ to maturity, 
within an organisation, brings into sharp focus ‘non-Web’ factors that 
are crucial to the success of any Web project. 



1 Introduction 

The World Wide Web continues to spread in its reach. The Web also gets more 
complex and varied in the available and developing range and numbers of 
applications, technologies and their overall impact. Consequently, there is a great 
concern, especially among the practitioners and researchers about the way Web sites 
and applications are developed and maintained [1, 2, 4, 6, 7, 8, 12-14, 20, 21]. 

Among the main issues of concern is the type of development model the Web 
developers should adopt in their work. Web development is seen as similar to 
software development in some aspects but very different in other ways, given its 
multidisciplinary and network dimensions and the far wider range of ‘users’ [4, 6]. 
Consequently, there have been numerous attempts to identify the important success 
factors and to build models to address specific concerns. There is also a general 
agreement that any one model will not suit all purposes and circumstances [6, 17] 
although there is a recognition that an evolutionary or incremental approach is better 
suited to the Web projects [14]. There is thus a need to conduct further studies to 
establish the correlation between different models and the specific environment under 
which they could be successfully adopted to serve specific purposes. 
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This paper reports on one such case study which, like the Web, continues to 
evolve. The (Web) project we have been involved in has over time exhibited ad hoc 
to systematic approaches as well as bottom up and top down strategies and has met 
with mixed success for a variety of reasons, including those to do with the 
organisation and people. Our experience and analysis, however, enable us to propose 
that, in an organisational context, the adjectives ‘incremental’ and ‘evolutionary’ 

apply even to the recognition of the importance of systematic development and 

subsequent adoption of appropriate methods, since most people only slowly realise 

that Web development requires a disciplined approach. We describe this 

phenomenon as ‘from process infancy to maturity’ and suggest that the speed of this 
evolution will depend largely on the characteristics of the organisation concerned. 

The paper is organised as follows. Section 2 briefly reviews the literature on Web 
Engineering and related disciplines to identify and discuss the important factors 
affecting the success and failure of a Web project. Section 3 describes and analyses 
the case study. Section 4 comments on and discusses the lessons learnt during the 
course of the case study, drawing upon studies from end user computing and diffusion 
of innovations, to derive extra factors that could make or break a Web project. 
Section 4 also makes a few recommendations based on the analysis and discussion. 
Section 5 identifies future work and concludes the paper. 



2 Web Site and Application Development and Web Engineering 

Web Engineering is about the use of systematic methods to develop Web sites and 
applications. There has been a fairly detailed analysis of what is involved in Web 
development, informed by the points of view of software engineering, graphic design, 
and hypertext and multimedia design [6, 7, 12, 14]. The easily identified factors 
which influence the development of a Web site and, hence, the processes to be 
adopted, include its purpose and functionality (scope, size and complexity), the 
changing technology, document orientation, importance of content and of graphic 
design, and budget and time constraints. 

This analysis has also led to a general agreement that any one model will not suit 
all purposes and circumstances; that, in reality, the successful outcome of a Web 
project depends upon other factors, such as people, internal politics, the divide 
between theory and practice and a general lack of understanding of the possible 
effects of the Web itself on an organisation and the way it functions [12, 14]. 

Common to all the discussions concerning models, development approaches and 
various engineering strategies (software, Web site, hypermedia, usability et al) is an 
assumption that ‘management’ is committed and knowledgeable, at least in the sense 
of letting the professional (teams) get on with the project. 

There is also some analysis of what may go wrong with Web projects. Nielsen [9], 
for example, identifies several areas of concern and mistakes, such as business model 
(Web as a marketing brochure), project management (outsourcing without 
coordination), information architecture (Web site mirroring the organisation chart), 
page layout (heavy graphics for “good” looks), content authoring (appropriate text for 
online readers) and linking. 

However, in spite of all the experience and exhortation of practitioners and 
researchers Web projects are frequently badly executed and result in poor quality. 
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In our experience, the origins of poor work lie in the deceptive simplicity of the 
visible Weh sites and applications, and the openness of its ‘code’ in the form of 
HTML tags. Many end-users, departments and, often, top management are misled 
into treating the Weh projects as ‘page construction to put a public face to the 
organisation’. They do not necessarily see the projects as complicated by any of the 
issues and areas of concern articulated by the experts. Their views are further 
strengthened by the marketing tactics of the software industry, which promise trouble 
free and quick development, deployment and maintenance if they use product x or y. 
It needs patience, perseverance and time, and more importantly, workable practices to 
overcome these perceptions. 

This case study had its origins in an environment not too dissimilar to the one 
outlined in the previous paragraph. The Web was too new for most of the people at 
the time; the organisation (the university) is collegial in nature, and allows a fair 
degree of freedom to the staff in developing their Web ‘sites’ and the functional units 
are not tightly knit to develop a unified approach. In fact, effects of these 
characteristics have been studied in some detail in diffusion of innovations by Rogers 
[15] and that analysis has a strong bearing on the outcomes of the case study, which is 
explored in Section 4 after the description of the case study. 



3 Description of the Case Study 

3.1 UWSM Organisation 

Like all other universities, UWSM is collegial, organised in functional units mainly 
along academic and non-academic lines with a commitment to academic freedom. 
Figure 1 shows part of the organisation structure (including only one of the five 
faculties for illustration). 
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Figure 1: Organisation Chart of UWSM (source: UWSM Calendar, 1998) 



While various faculties and administrative units communicate with one another, 
each unit functions more or less autonomously. IT Services bear the overall 
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responsibility for IT facilities and major applications, with the users allowed to 
supplement these with their own systems and applications. The stages of Web 
development that this paper reports are thus very much a result of the UWSM 
organisation. 



3.2 Overview of the Case Study 

Web development within the University of Western Sydney Macarthur (UWSM) 
started in 1995-96 but not as an overall, conscious effort and has never been formally 
constituted as a monolithic project. The case study, therefore, does not report on a 
specific, well organised project from its beginning to end. Rather, it traces and 
describes the evolution of major activities and thinking relating to Web development, 
which led us to propose and hold the first Web Engineering Workshop at the WWW7 
conference in Brisbane in 1998, (and subsequently at WWW8 in Toronto in 1999 and 
the International Conference on Software Engineering, ICSE99 in Los Angeles, also 
in 1999). We believe that this case history is not dissimilar to many others and is 
probably being repeated in different parts of the world in varying degrees. 

Briefly, the first (static) Web sites within UWSM appeared in late 1995 and early 
1996, progressed through isolated, independent and ad hoc stages to more systematic 
development of sites and Web-based applications culminating in an overarching, 
university-wide but non-monolithic project, which still allowed university 
departments and individuals enough freedom to create their own sites and 
applications. 

The evolution of these Web sites and other web-based developments can be better 
understood by following the analysis proposed by Lowe and Hall [6], augmented by 
the characteristics of a case study as a narrative exercise and three additional criteria, 
drawn from direct experience, as explained below. 

3.2.1 Characteristics of Hypermedia Applications 

Lowe and Hall [6] identify two major dimensions of hypermedia applications: 

• focus: on a continuum from presentation to structural 

• size: from small to large. 

They add two more dimensions to these two, lifespan (short and long) and 
development approach (ad-hoc and evolutionary), but closely associate them with 
focus and size. For the purpose of this paper, we treat the four dimensions as 
independent. 

3.2.2 Case Study as a Narration 

On a general level, the description of a case study is like a narration for which 
journalists and writers have a rule of 5Ws and IH (who, what, where, when, why and 
how). The four dimensions above cover, in order, why (focus), what (size and 
lifespan) and how (development approach). ‘Where’ and ‘when’ make the particulars 
of the case study and the subsections below. The ‘who’ is an interesting aspect of the 
entire Web development. Lowe and Hall probably equate ‘who’ to ‘Web 
professionals’, and hence do not elaborate on it, whereas the case study includes other 
players as well. 
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3.2.3 Additional Dimensions 

Web projects and development generally involve dealing with masses of information 
created by a multitude of agencies within the same organisation and accessed hy, in 
all likelihood, an even larger number of end-users. Further, the information content 
continues to grow over the life cycles of such projects. Finally, the projects may be 
developed by end-users themselves. Consequently, we add the following dimensions 
in order to better understand the maturity levels attained and exhibited by different 
groups in pursuing Web projects: 

• developers (end-users to experts) 

• user analysis (absent, implicit, explicit) 

• scalability (none to planned) 

• maintainability (none to planned) 



3.3 Stages of Development 

In the following, we identify four stages of (Weh) development linked to the levels of 
maturity within an organisation, calling them, infancy, childhood, growing up and 
adulthood. We also point out, based on this case study, that all these levels of 
maturity may be present withing an organisation at the same time. The implications 
of these maturity levels in formulating Web development policies are discussed in the 
next section. 

3.3.1 Stage 1 (‘Infancy’) 

The first attempts were directed at page construction, in early 1996, mistakenly 
labelled as Weh sites. The progression was: IT services. Research Office, Library, 
B&T (Faculty of Business and Technology), and ODER (Office of Development and 
External Relations). As the organisation chart makes clear, they are all independent 
units. Each one has a different point of view of how it wanted to present itself to the 
world. The units also differ in levels of IT expertise. These Web sites were 
uncoordinated, unsophisticated efforts by enthusiasts, which effectively created 
‘islands of information’, occasionally contradictory and always incomplete. 

In Lowe and Hall terms, the projects had the following characteristics: 

• focus: presentation 

• size: small 

• lifetime: short 

• process: one-off (ad-hoc) 

With the benefits of hindsight and evolutionary understanding, we can now add the 
‘who’ (the developers) and other characteristics to this list: 

• developers: enthusiasts, amateurs (end-users) 

• user analysis: absent 

• scalability: none 

• maintainability: minimal 

Eurther, these projects did not undertake any systematic analysis of the desired 
qualities of the sites nor did they benefit from or lead to policy and procedures to 
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improve the development processes. In essence, they were attempts to draw attention 
to the fact that various units ‘existed’ and wanted to be noticed. The processes were 
essentially in ‘infancy’ stage. 

3.3.2 Stage 2 (‘Early Childhood’) 

The evolution of the B&T faculty site represented the next stage for several reasons. 
The Web masters of this site (the first author and another colleague) came from IT 
background and saw that stage 1 seemed to repeat the experience of early software 
development, which was to use ad hoc methods to implement first and then deal with 
the problems as they arose, without much prior planning. 

The initial focus of the faculty Web site was still presentational but other 
dimensions were significantly different. The size of the site increased to a level 
making maintenance of the static pages difficult. The routine maintenance of the 
static Web pages could not be easily delegated to the faculty supporting staff because 
their ‘job descriptions’ did not include such tasks nor did they have the necessary 
technical background. The ‘ownership’ of the information came into dispute with 
ODER claiming exclusive rights, which raised a question about academic freedom. 
We also realised, from discussions with the university staff from different units, that 
there are more angles/views to a Web project than just software development [2]. 

The project now displayed the following characteristics: 

• focus: presentation 

• size: medium 

• lifetime: medium term 

• process: evolutionary 

• developers: IT experts and enthusiasts, motivated to become Web 
professionals 

• user analysis: absent (in the ‘traditional’ sense but see below) 

• scalability: none 

• maintainability: some 

There were two major outcomes of this and the next stage. The first one was an 
elaboration of the term ‘user’ [4]. The normal usage of the term implies the users at 
whom a Web site (and its information and functionality) is directed. We call them 
‘passive’ users, even if they are able to interact with the site. In addition to them, we 
now had another type of user. The routine maintenance and authoring of the Web site 
needed administrative assistance or ‘active’ users. ‘Passive’ user analysis was absent 
at this stage but we realised that attention had to be given to the ‘active’ users, raising 
some organisational issues in the form of training and job classifications. 

The second outcome was a very conscious comparison with (our understanding of) 
software development and the need for processes appropriate to Web development. 
This consciousness led during the course of stage 3, below, to the proposal of the first 
workshop on Web Engineering at WWW? in Brisbane in 1998 [8] and subsequently 
to the formulation of Master of Information Technology in Web Engineering and 
Design. The processes thus moved on from their ‘infancy’ stage. 

3.3.3 Stage 3 (‘Growing Up’) 

The second stage of mostly static, presentational B&T faculty site almost immediately 
led to the development of dynamic site and Web-based applications. There was a 
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general realisation that information kept changing, highlighting problems of 
consistency and accuracy. The Web was also recognised as an environment for 
distributed applications for both teaching and administration. The Web development 
thus moved from the static, (T exist’) stage to the active, (‘can do’) stage. 

Accordingly, this part of the Web development, confined to the B&T faculty, 
displayed the following characteristics: 

• focus: structural and presentational 

• size: medium to large 

• lifetime: long (in Web chronology) 

• process: evolutionary 

• developers: Web ‘professionals’ 

• user analysis: some analysis of both ‘passive’ and ‘active’ users 

• scalability: recognised as an issue 

• maintainability: improved 

However, the picture was relatively unchanged outside this faculty. Across the 
University, the efforts were still largely uncoordinated but there was a growing 
recognition of the need to connect the information islands, if not to merge them. 
Consequently, a university-wide Web committee was formed with representatives 
from all the major units and faculties. Lack of technical expertise and the different 
stages of Web development within the University units meant that the main role for 
the Committee was to develop policies and procedures and to facilitate ongoing 
dialogue across all disciplines and units. There was a conscious effort to transform 
the predominantly amateur, end user computing effort into more systematic work. 

In other words, there was, at this time, a divide within the University both in the 
use (focus) of the Web as a medium and the processes employed to develop the Web. 
Stages 2 and 3 thus point to the real possibility of different levels of process 
‘maturity’ existing simultaneously within an organisation at any given time. 

3.3.4 Stage 4 (‘Adulthood’) 

The development of Web-based applications in the B&T faculty and advances in the 
World Wide Web itself led to stage 4 where the University management understood 
and accepted the necessity of widening the brief of the committee mentioned above. 
Members of the top echelons joined the committee, sanctioned resources and helped 
to establish priorities (focus) for Web development. These are normal functions of 
the top management. The distinctiveness of this stage came from the fact that the 
committee understood the importance of the processes, debated them and for a short 
period oversaw the total development. The outcome of this (mature) approach was the 
formulation of seven university-wide projects which did not simply concentrate on the 
‘products’ or ‘deliverables’ but also encompassed, among other things, information 
management restructure, reconsideration of HR (human relations) policies and 
training for all university staff. The result was a blueprint for UWSM sites and 
information management, which could be then devolved to individual departments or 
faculties [3]. 

The characteristics of this stage of Web development, within the University, may 
be summarised as: 
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• focus: structural and presentational 

• size: medium to large (depending on the individual projects) 

• lifetime: long 

• process: evolutionary 

• developers: Web ‘professionals’ 

• user analysis: some analysis of both ‘passive’ and ‘active’ users and 
recognition of need for more detailed analysis 

• scalability: planned 

• maintainability: planned 

However, we do not claim that the processes have now reached a ‘mature’ level, 
especially since their uptake is still selective outside of these seven projects. 

3.3.5 Summary 

The case study illustrates the four stages of evolution of Web development and 
process maturity within the University. Table 1, below, summarises the 

characteristics exhibited at the four stages of development. Several questions may be 
raised about this analysis. Three pertinent ones are taken up in the next section. The 
first one concerns generalisation of these observations to other organisations. The 
second one is whether specific policies could be recommended for successful Web 
development within an organisation. The last question is how the Web development 
has proceeded within UWSM. 



4 Comments and Discussion 

There are two specific but interlinked aspects to Web development within UWSM. 
The first one is the use of Web itself as an innovation. A separate paper deals with 
how the work on Diffusion of Innovations correlates with this case study. 

The second aspect is about changes in the development processes or, as the title 
suggests, ‘from process infancy to maturity’. Web development in UWSM forced 
changes in processes not only from ad hoc to planned and evolutionary form, but also 
broadened their attention from mostly technical factors to include organisational ones 
as well. There is also active recognition that the ‘who’ of development now includes 
non-IT people and Web development is not solely in charge of IT experts, even in 
technological terms. 

If this second aspect, ‘from process infancy to maturity’, can be validly generalised 
then we are able to add to our knowledge of how to persuade people to adopt more 
systematic approaches specifically related to the stages of Web development. 
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Table 1. Maturity Levels and Characteristics of Web Development 



Characteristics 
of Web 
Development 


Stages of Web Development 


Infancy 


Early 

Childhood 


Growing up 


Adulthood 


Focus 


Presentation 


Presentation 


Structure and 
Presentation 


Structure and 
Presentation 


Size 


Small 


Medium 


Medium to 
large 


Medium to 
large 


Lifetime 


Short 


Medium term 


Long 


Long 


Process 


One-off (ad- 
hoc) 


Evolutionary 


Evolutionary 


Evolutionary 


Developers 


Enthusiasts, 

amateurs 


IT experts and 
enthusiasts 


Web 

‘professionals’ 


Web 

‘professional 

s’ 


User Analysis 


None 


None 


Some 


Greater 

emphasis 


Scalability 


None 


None 


Recognised as 
an issue 


Planned 


Maintainability 


Minimal 


Some 


Improved 


Planned 



4.1 Success Factors in the Adoption of Innovation 

The question of adoption of innovation is an interesting one and studied by many 
experts. Rogers [15] identifies the organisational variables which influence the 
successful adoption of technological, and in particular “computer innovations”. The 
variables which have a positive influence are: 

• Individual leader’s attitude toward change 

• Complexity or degree (of) high level knowledge and expertise 

• Interconnectedness, i.e. interpersonal networks 

• Organizational slack or availability of uncommitted resources 

• Size 

• System openness 

The variables which have a negative influence are: 

• Centralization (although this may encourage implementation) 

• Formalization or degree (of emphasis on) rules and procedures 

If we relate these variables to Web development and the case study, we find, in 
stages 2 to 4, a strong presence or conscious promotion of high level of knowledge 
and expertise, interconnectedness, system openness and decentralisation. The work of 
the two committees mentioned in the description may be interpreted as tending to 
formalisation, hence negatively influencing the innovation but, in practice, both the 
committees favoured decentralisation strongly and formalisation weakly. 
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4.2 Innovations and Practice 

Rogers [15] quotes several other studies to highlight other aspects of adoption of 
innovation. Van de Den [18] finds that “innovation almost never fits perfectly in the 
organization” and in fact transforms the structure and practice of the environments, all 
of which requires “a fair degree of creative activity to avoid, or to overcome, the 
misalignments that occur between the innovation and organization.” 

Orlikowski [11] interprets technology in an organization “as the product of human 
interaction, as its meaning is gradually worked out through discussion” rather than as 
an objective and external force. 

We suggest that the case study demonstrates the validity of both observations. 
Stages 2 to 4 actively worked at avoiding ‘misalignments’ and involved a great deal 
of discussions both within the committees and outside. Although the technical 
expertise varied greatly across the University, our experience shows that Web 
development within the University units remained at different levels because of 
varying degrees of open dialogue and ‘creative activity to avoid the misalignments’ 
within each unit. Thus, B&T devlopers grew out of ‘infancy’ fairly quickly whereas 
other did not. Technical training alone did not advance their Web development. 



4.3 Recommendations 

The main recommendation combines Roger’s observations [15] and our experience 
during the course of the case study. We recommend that Web development processes 
should not only encompass all the Web related aspects (‘look within’) but also other, 
organisational ones, as well (‘look without’). In doing so, Web development team 
will have to adopt an educational outlook and not restrict themselves to ‘doing a job 
to the given specifications’. We also hazard a guess that what the case study 
demonstrates may turn out to be an early harbinger of ‘process literacy’ although it is 
unlikely ever to be on a level of other ‘literacies’. 



4.4 Current State of Web Development at UWSM 

As an outcome of this work, we now have an evolutionary model of Web 
development that will enable us to analyse the levels of process maturity in all the 
units, academic and non-academic, much more clearly. The technical and other 
aspects can be more sharply delineated and suitable procedures identified for adoption 
to advance both Web development and the processes needed fo the purpose. 



5 Conclusions and Future Work 

The case study described how the Web development in UWSM started as isolated, 
independent and ad hoc efforts amd then evolved through stages to the point where a 
significant number of ‘stakeholders’ understood and adopted the evolutionary 
processes to systematise a whole body of work. When compared with the findings of 
studies on innovation, it is clear that many organisational factors contributed 
significantly to the success of both Web adoption and process maturity. 
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There are several themes either hinted at or not at all considered here which need 
to be explored more thoroughly and empirically. 

The paper outlines some of the challenges for Web developers, beyond those 
associated with the run-away technology and IT in general. These challenges need to 
be explored more thoroughly. 

A general lack of knowledge of Web Technology (and Engineering) means that 
both the ‘traditional’ IT and management are not necessarily in a good position to 
make correct decisions about Web adoption and the way to do it. The site design, for 
example, is more likely to proceed through rapid stages of development until 
sufficient understanding is reached of which processes suit the given context and in 
what order. 

The paper described the process maturity at UWSM which is a collegial 
organisation. These cannot be validly generalised for other types of organisations 
(hierarchical or collegial, centralised or decentralised). Future work could attempt to 
broaden our understanding in relating development paths and speeds to organisational 
structure and philosophy. 
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Abstract. This paper presents reflections on the application of 
evolutionary delivery methodologies to a dynamic, fast-paced Weh 
development environment. Conclusions and opinions in the paper are 
drawn from practical, industry-focused experience under the time-to- 
market pressures of the Internet industry. The methodology described 
offers a mix of rapid application development, evolutionary delivery 
and joint application development adapted to the unique climate of the 
Web. Aspects of projects utilizing the methodology are compared to 
other projects for evidence of improvement. The paper reinforces the 
need to approach Web development with a pragmatic reverence for its 
inherent uniqueness, but without losing sight of the lessons learned 
from existing software development methodologies. The methodology 
focuses intently on the requirements definition and early stages of Web 
projects. 
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1 Introduction 

Web developers rarely pause long enough to ponder their methodologies, but those 
that do often fall into one of two camps. One is those who believe the Web is different 
enough to justify a completely new engineering approach. The other feels that Web 
engineering is just regular software engineering, and conventional engineering 
methodologies apply unchanged. 

In reality, the truth lies somewhere between. In the experience of the author, no 
conventional, “out of the box” process or methodology is perfectly suited to the 
peculiarities of Web development. However, a number of industry best practices can 
provide a solid foundation from which to design an adequate process. As reinforced 
by Powell [^, Web development is software development, and that realization 
becomes vital as the industry makes the transition from a document-oriented approach 
to a software-oriented approach. As sites become mission-critical, as user populations 
grow exponentially, and as services mature and bring with them new maintainability 
concerns, managers and developers must begin to see the need for a more suitable 
Web-building methodology. The right process is one that properly marries tried and 
true engineering with an understanding of what makes the Web unique. 
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2 Process Overview 

Many Web development shops have little structure or process in place to meet the 
need for sound engineering and maintainability [^. In fact, the author has found that 
many practitioners often emerge from self-taught “hacking” climates that repel any 
process as overwhelmingly burdensome red tape. Project success often has more to do 
with the brilliance and valor of the contributors than the process [^. Management 
usually rationalizes that in the competitive landscape of the Web, any delays of time- 
to-market could be catastrophic. However, such crude practices have proven to be 
poorly scalable, especially when development teams begin to grow as quickly as user 
populations. 

Moreover, traditional models often seem poorly equipped to handle the Web’s 
vague requirements, unproven technologies and high rate of project change. 
Nevertheless, we must be careful to remember the hard lessons that framed previous 
methods - something engineers have often forgotten during so-called “paradigm 
shifts” in the past 0. 

Can an engineering methodology emerge that meets the need for speed without 
sacrificing system integrity and the need for maintainability? The short answer is yes. 
Before such a system can be defined, however, it is important to examine what 
distinguishes Web application development from other forms of software engineering. 

Primarily, Web development is different because the people who build Web sites 
are different. During the implementation - or execution - phase of a Web project, 
many various functions operate in tandem. Web projects often seem to have as much 
in common with magazine publishing than they do with conventional software 
projects. Contributors often have widely varying backgrounds and approaches to 
development, and many can be characterized as “non-technical” (e.g., graphic design, 
marketing, and editorial). Unlike traditional projects where diverse work is performed 
sequentially, Web development tasks are often done in parallel. This is usually 
necessitated by the fact that Web sites are usually not discrete systems, but rather a 
blended convergence of code, visual design elements and editorial content. A great 
deal of communication and team synergy is required for success. The outcome is 
more collaboration with non-technical personnel during the implementation stage than 
a typical software project. 

Secondly, a Web environment by its very nature is an immediate medium. The 
time-to-market lag from conception to delivery can be as short as a few hours. Web 
developers have the capability to modify their systems for all users immediately, 
without being impeded by the manufacturing, distribution and sales channel delays 
inherent in shrink-wrap software development. Consequently, Web development 
projects should be evolutionary in nature, with multiple staged deliveries throughout 
the lifecycle. This iterative process allows the product to meet the business need for 
rapidity while maturing the product in direct response to user feedback and usage 
patterns. Evolutionary delivery models have been cited in numerous studies as a more 
effective mechanism for meeting user needs, enabling open-architecture design and 
delivering systems on time |^. Thus, a Web engineering methodology should build 
upon a core of an evolutionary delivery model similar to those described by Gilb and 
McConnell 0. 

Based on the unique characteristics of Web development environments, a 
methodology that emphasizes cross-functional feature teams, collaborative product 
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conception and design and evolutionary delivery is most attractive. A Cross- 
Functional Evolutionary process (CFEP) has been developed, applied and refined 
over the last two years. Figure 1 shows various phases of this process. 




Figure 1 : Cross-Functional Evolutionary Process Components 



2.1 Collecting User Requirements 

User participation in Web development - although more difficult to achieve than in 
traditional development - is just as essential to success 0 In our process, user needs 
and requirements are collected in a number of ways. First, a massive amount of 
electronic mail is received each day by a customer support group. The group 
evaluates the submissions for trends and common requests. This “macro-feedback” is 
delivered to product managers and assessed for implementation potential. 

Second, a group exists to study user behavior and needs in focus group and user 
testing settings. These “micro-feedback” sessions are studied closely for potential 
product impact, modifications and usability enhancements. 

Third, product managers often interact directly with end-users to understand their 
needs and requests. Usually this is done to solicit additional information about an 
issue uncovered in electronic mail or user testing, but in some cases product managers 
randomly select users for direct questioning or to complete questionnaires about the 
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product. In other cases, beta products are released to a limited community of users in 
exchange for detailed comments and suggestions. 

Product ideas and initiatives - especially groundbreaking ones - are not always 
derived directly from user feedback. In many cases projects are motivated by the 
competitive landscape of the industry, revenue needs or “gut feelings.” 



2.2 Idea Formulation 

In the very early stages, a cross-functional Feature Team is assigned to the project, 
typically with representation from software development, quality assurance, interface 
design, graphics design, content or editorial, marketing and product development. 
Other departments participate depending on the characteristics or particularities of the 
project. The Feature Team investigates the product concept as currently conceived, or 
in many cases the permanent team itself initiates and flushes out the concept. 

Product managers lead the feature teams as organizers and decision-makers in 
disputes or lack of consensus. However, the process differs from many other 
methodologies in that essential personnel are assigned at the absolute earliest stages 
of the project, contributing to its definition immediately and reducing rework tied to 
misinterpretation. 



2.3 Collaborative Product Definition (CPD) Phase 




Figure 2: Collaborative Product Definition (CPD) Phase Detail 

The Feature Team defines the scope of the project and the characteristics of the 
product during the Collaborative Product Definition (CPD) phases. During these 
phases, the process draws heavily on brainstorming, faci litate d workshop and 
established Joint Application Development (JAD) techniques IB- The objective of 
CPD is to identify the high-level attributes of the product based on the team’s 
understanding of user needs, technical feasibility and business objectives. 




52 Kenneth S. Norton 



2.3.1 Assemble Preliminary Requirements 

The team prepares a list of the product characteristics required for success. These are 
obtained via a combination of common sense, requests from senior management and 
information gleaned from end users. The initial requirements are typically prioritized 
in order of importance as “must-haves”, “should-haves” or “may-haves”. This initial 
planning often saves time during later stages by establishing a baseline of 
requirements that are essential to success. 

2.3.2 Determine Participants 

The project and product managers work with functional managers to determine which 
members of the Feature Team should participate in the Feature Workshop. Leaders 
strive to keep the number of participants to a manageable level, generally between 5 
and 10 - comparable to the ideal level of 7 to 15 derived from industry research [9]. 
Attendees are selected based on their functions, expertise and estimated level of 
participation. Meeting organizers also work carefully to ensure an even distribution of 
job roles. Functions that are typically required to attend include product management, 
software development, user interface, quality assurance and graphics design. 

2.3.3 Feature Workshop Session (JAD) 

Feature workshops are structured, facilitated meetings lasting from one hour to two 
full days. The agenda is circulated in advance, and the objective is to emerge with 
consensus and documentation of high-level product attributes. Facilitators follow 
much of the JAD methodology encapsulated by Wood and Silver. 

The typical Feature Workshop begins with an icebreaker followed by a review of 
the agenda. Included with the agenda is a list of deliverables. Deliverables usually 
include a Feature Document, action items or prototypes. Ground rules for 
participation and process are also established up front. These can be as simple as “one 
person speaks at a time” [9] or as complex as “discussion of features relating to the 
registration system are outside the scope of this session.” 

The session facilitator then follows with a presentation and review of the current 
state of the product, including the objectives, business model, assumptions and 
preliminary requirements. From there, the meeting moves to specific discussion and 
brainstorming around additional characteristics. Unlike in some JAD sessions, 
participants rarely perform prototyping during the Feature Workshop. Instead, the 
team makes extensive use of white boards and flip charts and saves prototypes for 
post-session investigation and decision-making. 

A designated scribe takes meticulous notes throughout the Feature Workshop. 
Scribes are often assigned to draft initial revisions of Feature Documents based on 
those records. The session ends with a review of the decisions and a discussion of 
assignments and next steps. 

Feature Workshops give developers a head start on projects by soliciting their 
involvement in the process of conception. By participating in decision-making, they 
are expected to have a greater understanding not only the features themselves but also 
their context. JAD has become an effective mechanism for building cohesiveness 
among team members, encouraging pride of ownership and increasing 
motivation [p^. 
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2.3.4 Prototyping 

Prototyping has proven to be a valuable method for conveying specific requirements 
to the development team, illustrating multiple implementation options and enabling 
controlled usability tests prior to release. Prototyping is dismissed as harmful to Web 
development by Powell [1] who alleges that iterating and evolving prototypes confuse 
and frustrate end users. However, prototyping can be used sensibly as an internal 
method of communication and clarification without forcing end users to endure 
anything except a final, released product. The development team may find it helpful 
to prototype product elements on replicas of the live environment designated for 
development, testing and staging purposes. 

Prototypes are also used by the Feature Team after Feature Workshops to present 
products ideas to other members of the company, partners, the press and advertisers. 
Product proposals are often best supported with working prototypes or “mockups” of 
alternatives. 

2.3.5 Feature Document Drafting and Review 

After the Feature Workshop is complete, features are transcribed into a Feature 
Document that serves as a record of the team’s accomplishments. The document is 
reviewed and accepted by the team, usually without necessitating a formal review. 
Feature Documents describe the product from the perspective of the user and the 
business, and serve as the foundation for Functional Specifications. The level of detail 
present in a Feature Document is usually moderate, and the structure tends to 
correspond to meeting minutes. Documents are intended to communicate the 
decisions made in Feature Workshops and supporting detail for those decisions. 
Therefore, granular characteristics are normally left to the Functional Specification. 



2.4 Requirements Definition 

After the Feature Workshop and the CPD phases, the totality of the product scope is 
reduced to tangible, achievable product deliverables in a Functional Specification. 
The specifications are written by technical members of the Feature Team to 
decompose high-level objectives from the Feature Document into specific 
requirements that will drive implementation. These documents serve as the single 
authority for the development of test plans and the measurement of project success. 
Functional Specification authors striv e for tr aceability, correctness, simplicity, the 
avoidance of ambiguity and testability 1 1 1 B- The final specification is reviewed by 
the Feature Team and other stakeholders and baselined. 



2.5 Product Design 

The functional units next begin their design phases in parallel. On a typical project, 
this involves detailed software design, test plan design, and presentation design (user- 
interface and graphics). Depending on the project, this phase may also involve 
marketing and public relations strategy design and business development 
negotiations. The entire team maintains regular, comprehensive communication 
throughout this phase to ensure that designs are consistent and interfaces meet the 
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needs of each group. The highlight of this phase is careful, complete design through 
the decomposition of elements defined during CPD. 



2.6 Implementation (or Execution) 

Development takes place in parallel, with project managers playing a logistical 
leadership role. Communication continues to he vital during this phase as regular 
meetings and communiques are arranged. Feature creep is actively controlled through 
a rigorous change management process and by delaying changes to later delivery 
phases. 



2.7 Testing 

The quality assurance group performs black box testing on the entire product, both 
functionality and presentation. White box (or glass box) testing is typically performed 
on projects with high risk or with high quality expectations. Tests are performed 
according to the test plans and defects are measured against the Functional 
Specification. 



2.8 Release 

The product undergoes a rigorous release process during a regularly scheduled release 
window. Members of the feature team collaborate on a launch plan and lead the 
formal handoff of the product to software operations, internal users and the 
production teams responsible for the ongoing maintenance of the product or feature. 



2.9 Next Phase Initiation 

Each release is typically the end of a single staged release in an evolutionary process. 
Therefore, the team immediately begins work on the CPD or design of the next staged 
release. If the phase represents the final stage of a product, the team is typically 
disbanded or reassigned. 



3 Results 

In the period that the process has been used and refined, we have encountered reduced 
frustration amongst team members as well as a greater sense of participation and 
group accomplishment. Such improvement is difficult to measure objectively, but 
there is empirical evidence of reduced rework and a higher rate of on time project 
delivery. These results could potentially be attributed to a number of factors including 
the experience level of contributors and the general improving maturity of the 
organization. Additional research could be performed in this area with participant 
questionnaires and surveys. 

However, the most intriguing results can be found by examining the duration of 
initial conception phases, or the “fuzzy front end” of projects [^y. This detailed 
examination helped to test the hypothesis that CPD and its emphasis on cross- 
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functional communication is a faster mechanism for taking ideas to a level where they 
are ready for implementation. 

In this study, the “conception phase” was defined as all activities from the start of 
the project to the beginning of the Functional Specification stage. This corresponds to 
the completion of the Feature Document in the process described in this paper. The 
drafting of Functional Specifications has been a consistent aspect of the development 
process through all projects observed and is a suitable correlation point. To qualify for 
selection, a project was required to have had direct end-user observable impact (e.g., 
no infrastructure or maintenance projects were selected). Such projects require the 
greatest solicitation of end-user feedback and are therefore a better assessment of the 
entire process. In addition, the projects selected must have been staffed by one full- 
time equivalent each from software development, quality assurance, product 
management and interface design. We eliminated projects that were shorter than 20 
calendar days as these projects tend to have well-defined requirements at initiation. 
Finally, any projects that had length abnormalities due to known and unrelated causes 
were also removed from the study (e.g., projects spanning major holiday periods). 

The resulting data set consisted of 23 projects performed at a top twenty-five 
consumer Web destination site during a single 16-month period (1998-1999) (see 
Table 1). 

Of these 23 projects, 12 used the CPD process and 11 did not. The selected 
projects ranged from 20 to 177 calendar days with a mean of 73 days. To normalize 
for differences in complexity and resources, we observed the percentage contribution 
of the conception phase to the total project. In projects that did not apply the CPD 
process, the contribution ranged from 31% to 66% with a mean of 48%. In projects 
where the CPD process was applied, the range was 13% to 37% with a mean of 26%. 
Additionally, CPD projects were 31% shorter than non-CPD projects, with a mean of 
87 versus 59 calendar days. 

Without the benefit of detailed metrics such as KLOCs or function points, it is 
difficult to draw concrete conclusions about project length as it correlates to the use of 
the CPD process. However, the reduction of time spent in the “fuzzy front end” of 
projects is promising and merits additional research. 



4 Conclusions 

Cross-functional teams, facilitated workshops and evolutionary deliveries provide a 
solid foundation upon which to base a methodology for Web engineering. When 
applied judiciously, this process can help an organization meet its business objectives 
without sacrificing sound engineering practices and maintainability. Such a process 
also helps build effective teams by involving participants in all phases of development 
and harnessing their creativity throughout the product lifecycle. Additionally, the 
process can reduce time spent formulating and communicating product characteristics 
and objectives, allowing implementation to begin earlier. The process draws 
effectively on established industry best practices, but is executed and refined in a 
manner that accounts for the idiosyncrasies of the Web. 
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Table 1 : Observed Projects 





Total Project 




Conception Phase 




IShme 


Start 


Finish 


EXjration 


End 


Duration % Total 


Used Gallalx)ratiw Predict D^inition 


ftoject A 


4/6/98 


5/12/98 


36 


4/16/98 


10 


28% 


Project B 


5/13/98 


7/23/98 


70 


6/9/98 


26 


37% 


Project C 


10/1/98 


11/9/98 


38 


10/6/98 


5 


13% 


Project D 


10/5/98 


11/17/98 


42 


10/20/98 


15 


36% 


Project E 


9/14/98 


12/10/98 


86 


9/25/98 


11 


13% 


Project F 


11/23/98 


12/16/98 


23 


12/1/98 


8 


35% 


Project G 


11/18/98 


12/22/98 


34 


11/23/98 


5 


15% 


Project H 


11/23/98 


2/11/99 


78 


12/10/98 


17 


22% 


Project I 


12/1/98 


3/26/99 


115 


1 / 6/99 


35 


30% 


Project J 


11/20/98 


4/7/99 


137 


1/11/99 


51 


37% 


Project K 


3/5/99 


4/9/99 


34 


3 / 12/99 


7 


21% 


ProiectL 


3/22/99 


4/12/99 


20 


3/26/99 


4 


20% 


Did Not Use Collabor<Mve Product D^mtion 


Project M 


1/16/98 


3/10/98 


54 


2/4/98 


18 


33% 


Project N 


12/1/97 


3/30/98 


119 


2/2/98 


61 


51% 


Project 0 


2/27/98 


4/9/98 


42 


3/11/98 


14 


33% 


Project P 


3/25/98 


5/5/98 


40 


4/13/98 


18 


45% 


Project Q 


6/20/98 


7/14/98 


24 


7/6/98 


16 


67% 


Project R 


1/19/98 


7/16/98 


177 


5/15/98 


116 


66% 


Projects 


5/4/98 


8/5/98 


91 


6/23/98 


49 


54% 


Project T 


5/28/98 


8/27/98 


89 


7/13/98 


45 


51% 


Project U 


8/10/98 


10/1/98 


51 


8/26/98 


16 


31% 


Project V 


5/15/98 


10/1/98 


136 


6/29/98 


44 


32% 


ftoiect W 


5/21/98 


10/15/98 


144 


8/20/98 


89 


62% 



Acknowledgements 

This paper is based on data gathered while the author was employed at CNET and 
Snap.com. The author would like to acknowledge the invaluable assistance of Paul 
Heyburn, Mark Schlagenhauf, Gregory Sherwin and the rest of the engineering teams 
at those organizations. 




Applying Cross-Functional Evolutionary Methodologies to Web Development 57 



References 



1. T.A. Powell, Web Site Engineering, Prentice Hall, Upper Saddle River, N.J., 
1998, pp. 13-17. 

2. D. Lowe and W. Hall, Hypermedia and the Web: An Engineering Approach, John 
Wiley, West Sussex, England, 1999, pp. 212-213. 

3. M. Paulk, B. Curtis, M. Chrissis and C. Webber, Capability Maturity Model for 
Software (Version 1.1), Software Engineering Institute, Carnegie Mellon 
University, Pittsburgh, 1993, p. 15. 

4. R. Pressman, Can Internet-Based Applications Be Engineered?, IEEE Software, 
Sept./Oct. 1998, p. 105. 

5. T. Gilb, Principles of Software Engineering Management, Addison-Wesley, 
Workingham, England, 1988, pp. 84-114. 

6. S. McConnell, Rapid Development, Microsoft Press, Redmond, Wash., 1996, pp. 
425-432. 

7. A. R. Dennis, Lessons from Three Years of Web Development, Communications 
of the ACM, M. 1998, p. 113. 

8. P. H. Jones, Handbook of Team Design, McGraw-Hill, New York, 1998. 

9. J. Wood and D. Silver, Joint Application Development, John Wiley & Sons, New 
York, 1995. 

10. K. E. Chin, A JAD Experience, in (Ed) L. Oilman, Supporting Teams, Groups, 
and Learning Inside and Outside the IS Eunction, SIGCPR/ACM, Nashville, 
1995, p. 235. 

11. D. C. Cause and G. M. Weinberg, Exploring Requirements: Quality Before 
Design, Dorset House, New York, 1989. 

12. K. Wiegers, Writing Quality Requirements, Software Development, May 1999, 
pp. 44-48. 

13. P. G. Smith and D. G. Reinertsen, Developing Products in Half the Time, Van 
Nostrand Reinhold, New York, 1998, pp. 49-65. 




Development and Evolution of Web-Applications Using 
the WebComposition Process Model 



Martin Gaedke and Guntram Graf 

Telecooperation Office (TecO), University of Karlsruhe, 
Vincenz-Priessnitz Str. 1, D-76131 Karlsruhe, Germany 
{gaedke, graef }@teco . uni-karlsruhe . de 



Abstract. From a software engineering perspective the World Wide 
Web is a new application platform. The implementation model that the 
Web is based on makes it difficult to apply classic process models to 
the development and even more the evolution of Web-applications. 
Component-based software development seems to be a promising 
approach for addressing key requirements of the very dynamic field of 
Web-application development and evolution. But such an approach 
requires dedicated support. The WebComposition Process Model 
addresses this requirement by describing the component-based 
development of Web-applications. It uses an XML-based markup 
language to seamlessly integrate with existing Web-standards. For the 
coordination of components the concept of an open process model with 
an explicit support for reuse is introduced. By describing application 
domains using domain-components the process model addresses the 
need for a controlled evolution of Web applications. 



1 Introduction 

By supporting ubiquitous access to any kind of information and applications the 
World Wide Web (Web) has become the dominant platform for the delivery of 
hypermedia applications. Applications are subject to permanent change triggered by 
increasing competition, especially in commercial domains such as electronic 
commerce. Changes affect but are not limited to functionality, application interfaces 
and content [5]. These applications, referred to as Web-Applications, are strongly 
influenced by the specific properties of the implementation model they are based on. 
The high speed of innovation shortens the life cycle of Web-applications because 
applications are forced to undergo a permanent evolutionary process. Nevertheless, 
the increasing complexity of Web-applications is still addressed with a rather 
unstructured approach for application development and evolution [1] [19]. 

It becomes clear that the construction and evolution of applications for the World 
Wide Web requires similar support such as is available for traditional applications 
through models, methods and principles of software engineering. The World Wide 
Web with its particular characteristics and properties has become a new application 
domain of software engineering [10] that still needs a sound theoretical foundation. 
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This new discipline that has been established during the previous two years as Web 
Engineering promises to both reduce costs and increase quality during the 
development and evolution of Web-applications: 

Web Engineering - The application of systematic, disciplined and 
quantifiable approaches to the cost-effective development and 
evolution of high-quality applications in the World Wide Web. 

Web Engineering implicitly considers Berners-Lee's central demand for 
heterogenity of the system and autonomous administration of its resources [2]. This 
demand that we will refer to as the basic principles of the Web, is a major obstacle for 
current approaches to the development and maintenance of Web-applications, which 
will become obvious in section 2 of this contribution. Eurthermore, a fine-grained and 
reuse-oriented implementation is necessary to allow for the federation of existing 
Web-applications or application parts into new applications [14]. 

The positive experiences with component based software development and its 
advantages [32] [30] [46] [45] make it desirable to be able to use a dedicated 
component technology for the development and evolution of Web-applications. This 
is also a prerequisite to be able to fully take advantage of applying modern reuse 
oriented software engineering processeses to Web-technology. Besides an adequate 
component technology for the Web this also implies a process model in the sense of a 
software development model that describes the component-based construction and 
evolution of Web-applications on the basis of the basic principles of the Web. 

Existing process models for the development of Web-applications are discussed in 
the next section of this contribution. In section 3 a component-based process model is 
introduced that supports the reuse of components and that is in concordance with the 
basic principles of the Web. It models the evolution of a Web-application on the basis 
of dedicated domain-components. In section 4 a real application system is shortly 
described that has been developed and evolved according to the WebComposition 
process model and with the help of the WCML component technology. The 
contribution concludes with a short summary. 



2 Web Engineering Approaches to Software Reuse 

A new engineering approach is needed for the disciplined and reuse-oriented 
development of Web-applications. In this section we will therefore investigate process 
models for software development as well as dedicated process models for the Web. 



2.1 Process Models in Software Engineering 

The best known process models such as the waterfall model, explorative process 
models [44], the prototype model and the spiral model [4] are only with serious 
constraints applicable to the development of Web-applications [35] [31]. The strong 
dynamics of change that especially apply to large and long-living Web-applications, 
the distributed nature of the medium as well as the development process, and the basic 
principles of the Web lead to problems hindering the evolution of Web-applications. 

Since the end of the 1980s several new models have been introduced that, while 
based on the classical process models, focused on the object-oriented development of 
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software systems. Object-oriented development turns away from function oriented 
approaches as they have been suggested by e.g. Yourdan [47] and DeMarco [7]. 
Systems that have been developed on the basis of decomposition of functionality are 
often subject to tremendous changes when functional requirements change. In contrast 
it is possible to achieve a much higher consistency throughout the various steps of the 
process if object orientation in combination with an appropriate specification model is 
used. This also allows for an iterative approach. 

Well known examples of this class of process models are the Semantic Object 
Modeling Approach [22], the Objectory Software Development Process respectively 
Unified Software Development Process [28] and the OPEN Process [23]. 
Unfortunately none of them explicitly supports software reuse [32] [30] [46]. The 
basic principles of the Web as well as the paradigm of viewing software as a 
component system are further obstacles, since in this case heterogeneous and 
orthogonal processes have to be considered for the evolution of different components 
in heterogeneous environments. 



2.2 Dedicated Process Models for the Web 

2.2.1 Hypertext Design Model 

The Hypertext Design Model (HDM) [16] [17] is a model for the structured design of 
hypertext-applications. Therefore it describes a design model rather than a process 
model. Still, it supports the design phase of hypermedia-applications and can be 
integrated with existing process models. HDM requires that the application uses a 
consistent and predictable reading environment. 

An advantage of HDM is its ability to decompose hypertext-applications into fine- 
grained artifacts for reuse while considering ensuing structural relations. 
Nevertheless, the applicability of HDM is severely limited by its methodology. The 
developer is subject to a high cognitive load caused by an unsatisfactory process 
model that neither supports artifact reuse nor allows for modeling artifacts using an 
object-oriented paradigm. The mapping of a hypertext-application designed with 
HDM to a Web-application is difficult due to the lack of an appropriate support. 

2.2.2 JESSICA 

The project JESSICA [1] tries to cover the complete life cycle of a Web-application 
including analysis, design, implementation and maintenance. Schranz et al. [41] 
suggest an object-oriented approach that is based on Object Oriented Analysis and 
Design by Yourdon [48] and the Object Oriented Modeling Technique (OMT) by 
Rumbaugh et al. [39]. The concept of use cases is applied during analysis. The 
Unified Modeling Language (UML) is used to specify results during analysis and 
design. 

The JESSICA system provides a modeling language based in the Extended Markup 
Language (XML) and a mechanism for the automatic mapping from the design model 
to Web-resources. The design entities described using the JESSICA language are 
available for management and maintenance throughout the whole system lifecycle. 
The JESSICA language is object-based and as such provides only some object- 
oriented concepts for the abstract description artifacts of Web-applications. Through 
the use of templates and JESSICA objects the concepts of abstraction, encapsulation. 
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aggregation and inheritance are made available. References between corresponding 
objects are resolved by the JESSICA system and mapped to HTML links. 

JESSICA objects are simple sets of attributes. Many concepts and notations of the 
UML can therefore not easily be applied. Method calls are not supported because 
executable entities do not exist. The design is supported with a special, functionally 
limited UML editor that has been adapted to the JESSICA system. The JESSICA 
method does not explicitly support reuse, but JESSICA objects can we reused in 
heterogeneous environments since their description is based on XML. A separate 
application evolution is not considered due to the use of classical process models. 

2.2.3 Object-Oriented Hypermedia Design Method 

In contrast to HDM, the Object-Oriented Hypermedia Design Method (OOHDM) [43] 
offers a clearly defined procedure for the development of hypermedia-applications. 
OOHDM consists of the four steps conceptual design, navigational design, abstract 
interface design and implementation, which have to be executed according to an 
iterative and incremental process model. 

A Web-application can be described with the help of three models [42]. The 
conceptual model corresponds to a traditional object-oriented model and describes 
design entities using UML notation, the navigation model describes the navigational 
view on the conceptual model and the abstract interface model describes the 
presentation of interface objects. 

The modularity and reusability of design concepts is relatively high due to the high 
degree of abstraction found in the resulting models. On the other hand does the 
generality of the modeling approach tend to lead to a higher complexity. Eor example 
the method explicitly supports the use of design patterns but does not support the 
retrieval of patterns nor does it provide assistance for automating the implementation 
of reusable design artifacts. 

OOHDM considers several important aspects of Web-applications but is lacking 
system support that adequately corresponds to the basic principles of the Web. This 
implies many problems such as the implementation in heterogeneous environments, 
the integration of distributed objects and artifacts. 

2.2.4 Relationship Management Method / RMCase 

The Relationship Management Method (RMM) by Isakowitz et al. [27] is a platform- 
independent process-model for hypermedia applications that can also be used for the 
development of Web-applications. The process model consists of seven detailed 
phases. An important aspect associated with the use of this process model is the 
support through a tool. 

RMCase [8] is a CASE tool that supports the complete life cycle of a Web- 
application. It is structured into a series of contexts that provide different views on the 
design objects. Some of these views correspond to models associated with single 
phases of the process model. During the design phase a special language supports the 
definition of a data-model adapted to hypertext-systems. While this data-model is not 
equally well suited for all kinds of Web-applications it greatly facilitates the mapping 
of relational database content to the Web. 

Unfortunately, the problem of integrating seamless evolution to the lifecycle of a 
Web-application remains unsolved. Due to the nature of the model, maintenance and 
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reuse are mainly limited to data rather than artifacts or components and are subject to 
restrictions through certain initially defined structures. Reuse of artifacts in other 
processes is not explicitly supported. 

2.2.5 WSDM 

The Web Site Design Method (WSDM) [6] is focused on user-centered rather than 
data-centered design. The design is driven by the views of different user-classes 
instead of the available data. 

The process model is limited to a class of pure information systems, so-called 
"kiosk Web-sites". It does not support the user-class "application Web-sites" that 
encompasses the remaining often more complex application systems. Typical problem 
areas such as maintenance are not explicitly addressed. Like in other models there is 
no explicit support for reuse. 



2.3 Comparison of the Process Models 

Table 1 briefly summarizes the most important aspects of the dedicated process 
models for the Web that have been discussed in this section. The following criteria 
have been applied: 

• Consistency (C): This criterion describes how easy it is to migrate entities 
between models of different process steps. 

• Web-Characteristics (W): Indicates if the process model provides support due 
to Web-specific characteristics, e.g. Links, coarse-grained implementation 
model. 

• Gap (G): Does the approach supports mapping a design to the 

implementation-model of the Web. 

• Loyalty to principles (L): This term describes how well the process model is 
in concordance with the basic principles of the Web, e.g. does the process 
model allow reuse of artifacts developed in orthogonal processes. 

• Explicit Reuse (R): This criteria indicates whether the model explicitly forces 
reuse of entities. 

• Evolution Plan: This criteria describes if the process model supports a 
persistent consideration of evolution, as supported in modern software 
engineering process model, cf. Domain Engineering. 



3 WebComposition Approach 

In this section we will describe the WebComposition approach consisting of the 
WebComposition Process Model, reuse management, a component technology, and 
evolution planning based on domain engineering and an evolution model. 
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Table 1. Comparison of the process models 
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Fig. 1. Evolution spiral of the WebComposition Process Model 



3.1 WebComposition Process Model 

The WebComposition Process Model consists of several phases. The phases are 
derived from the common phases of modern (object-oriented) process models as well 
as solutions addressing the need of software reuse. The Process Model follows an 
evolution spiral consisting of evolution analysis and planning, evolution design and 
evolution realization (Figure 1). 
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The WebComposition Process Model is an open process model. This means that 
within the evolution spiral for the application various orthogonal processes following 
different process models can exist. The importance of openness will be detailed in the 
following paragraphs. 

As an example, a development team could develop a component according to the 
Waterfall Model while another team favors a different process model for its own 
problem domain. Figure 2 shows the coordination of different processes through the 
WebComposition Process Model and illustrates its openness. 

The coordination of the models is made possible by an explicit and coordinated 
reuse management [11]. All process models must therefore be adapted to the reuse 
management which is e.g. possible through generalization [32]. This seeming 
disadvantage of generalization of process models only applies to the storage and 
management of artifacts and is thus anyway part of the application of a process 
model. 




Fig. 2. Coordination of orthogonal processes in the WebComposition Process Model 

Figure 3 details the view of the process model for the development team that uses 
the waterfall model. The phases have been adapted to the WebComposition Process 
Model. 

The bi-directional arrows symbolize the coordination of artifact exchange. Through 
the adaptation of a model the reuse management gains access to all important 
documents, components, code-fragments etc. that are of importance within the 
process or in the lifecycle of the considered component. By integrating all processes 
of the different process models with the WebComposition Process Model the reuse 
management can determine the state of all components of a component software or a 
Web-application and can use this information for evolution. 

The consistency of object-oriented process models is based on the assumption that 
one model is jointly used in all phases. In WebComposition this basic idea is extended 
towards the coordination of different processes in component software. A 
precondition for this is therefore the definition of a model that can be used for the 
different process models, especially the object-oriented ones, as well as for the 
coordination of the artifacts from the different phases of all processes. It seems 
straightforward to choose an object-oriented model for the adaptation within the 
WebComposition Process Model. This adaptation model and the reuse management 
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will be described following this sub-section. Therefore we assume their existence 
when describing the WebComposition Process Model here. 




Fig. 3. Waterfall adaptation within the WebComposition Process Model 

The application of a process model for the production of software systems implies 
the existence of different artifacts that serve as criteria for the advancement from a 
process step to the step that succeeds it. From a component perspective it becomes 
obvious to regard all artifacts within this process as components and to ideally 
simulate their "creation" through reuse. Examples for reusable artifacts/components 
are project plan after the analysis phase, design patterns [15], code-fragments created 
during the implementation phase or coding and testing policies that are used during 
the implementation and testing phases. A component system is created via the 
composition of these components. 

3.1.1 Dynamic Aspects within the Process Model 

For the integration of a process model into the WebComposition Model the artifacts 
that are created during the process must be compatible with the WebComposition 
Process Model. To ensure this the artifacts must be adapted to the model with which 
artifacts are described within the WebComposition Model. Only by description of all 
artifacts on the basis of a uniform model can these be reused within processes of 
different process models. The use of a uniform model requires in this case at most two 
transformations of an artifact. First the artifact of the producer process is transformed 
to the WebComposition Model from where it can later we transformed into an artifact 
of another process. Thus for the description of artifact types in a model an object- 
oriented approach is suitable. 

From a reuse perspective all phases must begin with the question wether or not 
there already exists an appropriate solution that can be reused, either directly or in a 
modified version. As an example we will describe the development of an order entry 
system for mobile phones. During analysis it is stated that an appropriate program 
does not exist and that the design would include two main functions: the input of 
customer data and the selection of a telephone to be ordered. During the subsequent 
step it is found out that there already exists an appropriate component for the input of 
customer data. The development of the new Web-application/component therefore 
includes the reuse of an already existing component for customer data input. Figure 4 
illustrates the process and also shows the influence from the reuse process. 
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Fig. 4. Consumer-Reuse - Evolution-realization 

It is obvious that the WebComposition Process Model is characterized by a 
standardized form of artifacts and components. Through the open interface, that is 
standardized within the reuse management, arbitrary process models can be 
integrated. This way the requirements deriving from the basic principles of the Web 
can be met. 

3.1.2 Immanent Aspect of the Process Model 

The immanent aspect of the WebComposition Process Model targets the management 
and support of reuse. Therefore, the different processes must be coordinated, as 
described above. This means design artifacts must be managed within the 
WebComposition Model. The interface between the reuse management and a process 
is based on the shared manipulation of artifacts/components. Figure 5 details the reuse 
management according to the WebComposition Process Model. 
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Fig. 5. Immanent aspect of reuse management 

The reuse model has been coupled with the process models via a messaging 
mechanism for the interchange of components and knowledge about components. The 
messages can also be modeled as components, which allows the component model to 
also be consistently applied here. Furthermore, to realize reuse via the concept of 
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message interchange is in accordance with the distributed nature of the Web. 
Therefore it is also possible that the processes themselves are distributed. Therefore, 
in the remainder of this section the WebComposition Model and the reuse 
management will be detailed further. 



3.2 Reuse Management 

In this sub- section a reuse model will be described that details the WebComposition 
Process Model. It is motivated by the so-called component dilemma [24] that states 
that the probability to own a component that can be used to solve a specific problem 
increases with the number of available components while at the same time the effort 
needed to locate such a component within the set of available components increases 
as well. The retrieval of components in libraries is therefore a widely discussed 
problem [36] [37] [33] [25]. 

3.2.1 Separation of Model and Representation 

Representations deliver information about components to a searching instance or 
provide classifications and groupings of components. Representations also contribute 
to the technical solution of the problem "Finding components" [9] [37]. Frakes and 
Pole have analyzed the use of different representations in the context of an empirical 
study [9]. One aspect has been the comparison of representations in terms of 
effectiveness and efficiency. The main result of the study is that each representation 
was rated by the participants as the best and the worst depending on the task to be 
performed. No representation was considered suitable for all tasks. Thus, the study 
concludes that as many representations of a component as possible should be stored to 
best suit the needs of different users (Consumer Reuse) performing different tasks. To 
achieve this it is necessary to separate model and representation. 

The separation of a model and the view on a model is a well-known design pattern 
[15]. In the case of a representation information about components is displayed. This 
kind of information we also call meta data. Thus strictly speaking we need to consider 
at least two separate models, one for components and one for meta data. A 
representation can then use a meta data model to supply a result from the component 
model. 

The requirement for arbitrary representations implicates that more than one meta 
data model and corresponding representations could exit and that several 
representations may need to access the same meta data. The solution therefore is 
based on a dynamic concept that allows for a loose coupling of models and 
representations. Based on the concepts used so far, which are all in accordance with 
the basic principles of the Web, this coupling mechanism can be considered to be a 
message interchange between meta model and representation. This approach will be 
detailed in the following sub-sections. 

The basic idea is to only use the minimum of requirements that must be defined to 
enable the collaboration of meta data, components and means of representation. The 
minimum requirement is that each component can be uniquely identified. This 
requirement is already part of the definition of a component and therefore always 
satisfied. 
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3.2.2 Reuse Model 

The two different views towards reuse, consumer view (Consumer Reuse; 
Development with reuse) and producer view (Producer Reuse; Development for 
reuse), are integrated into a consistent process model that complies with the existing 
interfaces of the WebComposition model. 

Starting point for the conception of the reuse model is the before mentioned 
process model of Prieto-Diaz, since it is well suited to be concretized for Web 
Engineering tasks. The disadvantage of this model is that a concrete representation is 
assumed. In the following sub section a new reuse model will be introduced by 
generalizing the reuse model of Prieto-Diaz. It will then be concretized for use with 
the WebComposition Process Model. The necessary system support will be available 
through a reuse-repository described in [11]. 

3.2.3 Consumer View of the General Reuse Model 

The generalization of the reuse model happens through a generalization of the 
specification i.e. it will no longer be predefined by the model what kind of 
specification to use. While the original reuse model required the specification of the 
functionality of a component this requirement will be replaced by an undefined but 
selectable property. This allows for the use of arbitrary component properties for the 
selection of a component without violating the reuse model. For the selection of a 
component the modified reuse model assumes the existence of a set of components. 
The organization of this set is hereby not regarded further so that the reuse model will 
not define any constraints that would limit it to a certain component technology. 
Merely the uniqueness and identifiability as required by the definition of a component 
are used. 

Henninger describes in [25] that at the beginning of the reuse process often no 
complete and sufficiently qualifying request can be formulated that could lead to the 
desired component. The iterative approach through request-respecification often used 
by large search engines or in libraries has proven useful in this respect. A potential 
reuser can use the knowledge gained through the request to specify a refined request. 
While the original reuse model specifies a measure for the minimum of similarity a 
component must show in respect to the search request before it can be suggested to a 
potential reuser, the generalized model does not define such a measure. The 
determination of the result set neither needs to be computable nor verifiable. This is 
an important requirement for linking arbitrary representations to the reuse model. The 
following shows a tool that enables the retrieval of components through a graphical 
user-interface. With the help of this tool it is possible to navigate the component set 
via a graph with nodes representing components and edges representing prototype- 
instance relations between components. 

3.2.4 Producer View of the General Reuse Model 

From the perspective of the component producer a mechanism is required to define 
meta data for a component to enable a classification according to its functionality. 
This can be achieved by manual specification of the attributes to describe a 
component or through an automatic process that produces meta data either using other 
sources of information (e.g. development system, documentation system) or by 
analyzing the component itself. The automation can hereby take place on a semantical 
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as well as a syntactical level. The data can also contain quality attributes or 
certification information. 
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Fig. 6. Graphical support for component retrieval 



3.3 WebComposition Component Technology 

In this section we introduce a dedicated component technology that satisfies the 
requirements of the previous models and that enables the application of the 
WebComposition Process Model. 

The description of components and relationships between components takes place 
with the help of the WebComposition Markup Language (WCML) [12], which itself 
is an application of XML. WCML introduces components as a uniform modeling 
concept for artifacts during the life cycle of Web-applications. The model supports an 
automated mapping to entities of the implementation model while still allowing for a 
detailed manipulation of that process. It also allows the modeling of e.g. analysis 
artifacts, design artifacts or reuse products through an object-oriented concept and 
various mechanisms for abstraction. 

Within the WCML model a Web-application can be described as a composition of 
components. From the perspective of the progress of different processes Web- 
applications can consist of a hierarchy of finished components that each correspond to 
whole parts of Web-applications with resources or fragments of resources. On the 
other hand for certain parts of a Web-application only information from analysis or 
design may exist. The later mentioned components are part of the application and are 
an indication for how the application will or can be developed further. Components 
within the WCML-model are identifiable through a "Universally Unique Id" (UUID). 
They contain a state in the form of typed attributes, called properties, which resemble 
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simple name- value-pairs. The value of a property can be defined through static text or 
WCML language constructs and must correspond to the data-type of the property. 

The before mentioned concept of developing component software by composition 
is realized with the WCML language constructs. Each component can be based on 
any number of other components and use their behavior by referencing them or by 
using them as prototypes. A modification of a component can therefore consistently 
change all components that use it. A detailed description of WCML can be found in 
[12] [10]. 

3.4 Evolution with Domain-Components 

The requirements for a software system change as time goes by. It is obvious that 
many kinds of influences are responsible for this, such as new regulations, changes in 
corporate identity or an extension of functionality. Such maintenance tasks are 
difficult to handle if the application has not been designed with the possibility of later 
changes and extensions in mind. 

To allow for a disciplined and manageable evolution of a Web-application in the 
future it makes sense not to design the initial application on the basis of the concrete 
requirements identified at the start of the project. Instead the initial application should 
be regarded as an empty application that is suitable for accommodating functionality 
within a clearly defined evolution space. 
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Fig. 7. Dimensions of a Web-application's evolution space 

This approach is based on domain engineering which is has been described as a 
process for creating a competence in application engineering for a family of similar 
systems [26]. During an analysis phase the properties of an application domain are 
determined. During a design phase this information is transformed into a model for 
the application domain. From this the required evolution space can be determined 
and, during the implementation phase of the domain engineering process, the initial 
application can be constructed as a framework ready to accommodate any kind of 
functionality that lies within the evolution space of the domain. 
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This view can be extended to several application domains. The basic architecture 
of a Web-application we will therefore describe with the term evolution bus. The 
evolution bus is the initial application for all abstract application domains of a Web- 
application. 

It enables the management and collaboration of domain-components, i.e. WCML 
components that implement specific application domains such as Web-based 
procurement, reporting or user driven data exchange. These domain-components, also 
described as Services within the WebComposition Process Model, also represent 
prototypes for future services of the same application domain. The evolution can take 
place in two clearly defined ways (figure 7); 

• Domain specific evolution - The extension of a domain through new services, 
e.g. by prototyping an existing service of a domain. Another possibility is that 
the domain itself changes or that it receives more functionality, which requires 
the modification of the domain's initial service that serves as a prototype for 
other services. 

• Evolution of the domain set - The evolution of an application is also possible 
through the modification of the domain set. The extension of an application's 
functionality by adding a new application domain takes place e.g. when a 
shopping basket and corresponding functionality is added to a Web-based 
product catalog. The integration of a new domain is realized by connecting a 
new initial service to the evolution bus. 

A framework for implementing the evolution bus and the services can be 
developed with WCML as described in [13]. Furthermore, due to the different levels 
of granularity found in service components, tools can be developed that support 
evolution in special ways, such as sgervice factories and domain specific languages 
[21]. The second phase of the WebComposition Process Model is therefore dedicated 
to Producer Reuse. Figure 8 gives a detailed overview of the complete process. 




Fig. 8. Complete evolution process 
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4 Application of the WebComposition Process Model 

In a joint project between the Telecooperation Office (TecO) at the University of 
Karlsruhe and Hewlett-Packard (HP) a service-oriented e-commerce application, 
Eurovictor, has been developed based on the WehComposition Process Model. The 
aim of the project was to develop a support system that enables development, 
management, maintenance and evolution of services within the heterogeneous 
environment of the European intranet of the HP company. An evolution bus was 
realized as part of the project. The evolution of the application took place through the 
integration of several domain specific services such as presentation of information, 
software orders or product purchase. These services serve as prototypes for the 
domain specific evolution of the application. 




Fig. 9. The Eurovictor example 

Eigure 9 shows the Web-application's start page. The left part contains a menu that 
allows for a selection of services. In the middle and on the right side two special 
services can be found: A service for the adaptation of the application to the behavior 
of the current user and another service offering shopping basket functionality. The 
adaptation service demonstrates the flexibility the application gained due to the 
domain-component specific architecture it is based on. 

A large part of the domain specific evolution of the application has been triggered 
by its international scope. As an example new services are constructed through 
inheritance that are adapted to national layouts, languages or legal regulations. 
Meanwhile in almost all European countries services are developed in a distributed 
and decentralized manner and made available via the Eurovictor system. 
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5 Conclusion 

The Web has been established as a major platform for applications, although the 
underlying implementation model complicates the development and evolution of 
Web-applications. The increasing complexity of Web-applications makes a 
disciplined approach for development and evolution mandatory. It is therefore 
desirable to use component-based software development and its advantages in Web 
Engineering. Unfortunately this is made difficult through the basic principles of the 
Web that demand autonomous development within a heterogeneous environment. 

The WebComposition Process Model describes a consistent approach to the 
development of Web-applications as component software. It introduces the concept of 
an open process model that allows for the integration of arbitrary processes for the 
development and reuse of components i.e. parts of the desired application. Reuse is 
supported by modeling all artifacts during the life cycle of an application component 
as standardized components. Furthermore, the reuse model explicitly supports the use 
of artifacts by different processes. The artifacts are modeled as components with the 
WebComposition Markup Language. WCML is an application of XML and is in 
concordance with the basic principles of the Web. In the WebComposition Process 
Model the evolution of a Web-application is planned based on the concept of domain 
engineering. The application domains are described through services that correspond 
to domain-components. The evolution by application domains is a central part of the 
process model and is described through the so-called Evolution Bus, a framework for 
the integration of domain-components. The WebComposition Process Model has been 
successfully applied to several real world applications. The advantages for the 
evolution could be verified in a project for a large international Web-application at the 
company Hewlett-Packard that has been developed according to the process model. 

The WCML-Compiler and further information related to the WebComposition 
approach can be found at: http://www.webengineering.org 
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Abstract. This paper examines the issues related to developing web 
applications that use digital media, with particular emphasis on digital 
video. The nature of digital video brings additional complexity to 
engineering solutions on the web due to the large data sizes in 
comparison with text, the temporal nature of video, proprietary data 
formats, and issues related to separation of functionality between 
content creation, content indexing with associated metadata, and 
content delivery. The goal of this paper is to contribute to the 
understanding of different component technologies involved in 
deploying video-based web applications, and the tradeoffs involved 
with each option. As an illustrative example, we describe the 
requirements leading to the architecture of a video-based web 
application. Cue Video: a system for search and browse of video and 
related material. 



1 Introduction 

Web Engineering has been defined as the use of sound scientific, engineering and 
management prineiples and diseiplined and systematie approaches to the successful 
development, deployment and maintenance of high quality web-based systems and 
applications [7]. This paper examines how multimedia, and in partieular digital video 
is supported in a typical web application. We describe the issues related to the usage 
of digital video in applieations, and summarize the current state-of-the-art 
teehnologies that enable the deployment of video in web applications. Finally, we 
describe the components of a specific web application deploying video, and illustrate 
the reasons that lead to this particular architecture. 

Today, large collections of multimedia doeuments can be found in diverse 
application domains such as the broadcast industry, education, medical imaging, and 
geographic information systems. Digital video libraries are becoming pervasive since 
infrastructure needs such as storage requirements, computing power, and high 
bandwidth networks are being addressed adequately. Video streaming technology 
allows content to be delivered to the user as a continuous flow of data with minimal 
wait time before playback, such that the user does not need to partially or fully 
download the video files before starting to view the video. Further, a variety of video 
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editing, capture and compression tools are available to facilitate the manipulation of 
digital video. Seemingly then, all component technologies necessary for deploying 
web applications containing video are available. However, as we start to develop such 
an application, several architectural issues arise that are unique to the characteristics 
of digital video. These include partitioning of video related application functionality 
between front-end and back-end, and how the video control program logic gets to the 
front-end. Another source of the problem is the application functionality and selection 
of competing transport protocols, video compression codecs and containers on the 
weh. For example, there exist several standard transport protocols, and video 
compression codecs; however, there is no universal consensus in their usage and 
proprietary alternatives are often used. This leads to incompatibilities between 
platforms and browsers, and requires additional development to support web 
applications deploying video. 

Typical web applications may be characterized as adhering to a three-tier 
architecture. The front-end or user interface is delivered to the client by the web 
server, and is rendered by any HTML capable web browser. The middle-tier 
consolidates the application specific logic and is invoked by the web server using a 
standard API. The middle-tier may also connect to the third-tier or back-end database 
using traditional client-server communication protocols independent of the web. Sueh 
an architecture enables the separation of the GUI from the application specific logic. 
One issue specific to video-based web applications is that typically searchable 
metadata associated with video is often kept separate from the servers that actually 
deliver video. For example, the metadata may reside in the third-tier such as a 
relational database, while the video may be streamed from a streaming media server. 
The streaming media server does not strictly belong to the third-tier since it directly 
streams media to the client. It may be seen as a new extension to the middle -tier, since 
it does not fit into the application server directly. 

We view the process of developing web applications that deploy video as having 
four components to it: 

1 . Content creation ineluding generation of digital video and authoring 
multimedia presentations 

2. Content cataloging and indexing tools that create searchable metadata 
associated with the media 

3. Design of search and browse system that ineludes search engine and user 
interface design 

4. Content delivery and presentation 

The challenges associated with all four components listed above currently serve as 
impediments to the widespread usage of video. Certain operational aspeets related to 
this such as streaming media technologies, and optimization algorithms for efficient 
content delivery on the web are being adequately addressed by advances in 
technology. In this paper, we focus on software engineering issues related to the 
fourth component involving multimedia content delivery and presentation on the web. 
First, we describe general issues related to streaming media on the web, and second, 
we describe specific application issues related to developing advanced streaming 
media functionality on the web. 
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2 Streaming Media 

Streaming media is a necessary component of all web applications that deploy video. 
There are several factors that must he considered when selecting a particular 
streaming media technology. We describe some of those factors. 

2.1 Media Player Functionality 

The general requirements for any streaming video player are reliability, support on 
multiple platforms, and the availability of an SDK that provides programmable 
eontrol over the GUI. Consistent with the thin-elient model, minimal installation and 
eonfiguration on the elient side is desirable. Video streaming typieally provides an 
applieation with direet aeeess to any arbitrary loeation in the video and immediate 
playbaek from that point with only a small delay for buffering. For advaneed 
applieations, additional requirements may be the ability to switeh playbaek between 
different videos in the same player window, and the ability to ereate hotspots or 
linked regions in the video. Some of these requirements are addressed by the W3C 
SMIL standard [4] for synehronized multimedia over the web. 

The W3C SMIL standard allows the integration of a set of independent multimedia 
objeets into a synehronized multimedia presentation. Sinee a statie web page has no 
internal timeline, there exists no preset order in whieh images and text is downloaded. 
The SMIL format enables a presentation eomprising of multiple video/audio elips, 
images, and text to be synehronized on a single presentation timeline. Using SMIL, an 
author ean deseribe the temporal behavior of the presentation, deseribe the layout of 
the presentation on a sereen, and ean assoeiate hyperlinks with media objeets. 
Client/server support of SMIL format enables an applieation to render powerful 
synehronized multimedia presentations. Today, most video streaming servers and 
players support the SMIL format based on standard and proprietary formats for 
eompressed video [6, 9, 11, 17]. 

2.2 Media Player Client Software 

Almost all solutions available today are based on browser plug-ins/AetiveX 
eomponents with varying levels of desired frmetionality. A plug-in is a unit of eode, a 
shared library or a DLL, whieh is linked into the web browser whieh is assoeiated 
with a MIME type and aetivated when the user retrieves a resouree of that type. It is 
then given some eontrol over the browsers sereen real-estate and ean render the 
resouree in whatever way is appropriate. Sinee plug-ins eonsist of native eode, they 
are not eross-platform and have full aeeess to all resourees of the elient host and thus, 
are not very seeure. The plug-in eode is usually down-loadable over the web and has 
to be installed or exeeuted. Henee there is an issue with maintenanee and propagation 
of updates. The other option is Java applets that enable the safe exeeution of 
downloadable eontent. However, this has not been as popular sinee it requires 
eonsiderable development effort, has possible performanee issues, and eontinues to 
rely on native eode in some eases. Some Java applet solutions are highly optimized 
for minimal eode size and do not rely on native eode. However, the limited eode size 
imposes limits on player performanee and frmetionality. 
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2.3 Client/Server Platforms 

In the area of multimedia, selection of an operating system for a media server proves 
more challenging than traditional web solution deployment. In an effort to reduce 
web support costs, companies often select a standard platform (hardware, operating 
system etc.) for web servers, which may not be equipped to handle media streaming. 
The selection of both server and client platforms for streaming media have different 
implications in web applications deploying video: 

• Web-browsers are intended to support client applications in a platform- 
independent manner. This is not entirely true with streaming media applications. 
Currently, different platforms support a subset of the available streaming formats. 
For example, the plug-in player that supports QuickTime streaming is available 
on the Macintosh and Windows platforms only. Similarly, the plug-in player that 
supports Windows Media streaming is available on the Windows platform only. 

• Streaming servers must be deployed on an operating system that supports the 
latest release of streaming technology in order to take advantage of new features. 
For example, Apple’s current open source QuickTime streaming server is 
available on the Linux platform only. This means that QuickTime streaming can 
be deployed only if Linux is selected as the streaming server platform. 

• Media servers may incur occasional downtime in order to incorporate latest 
upgrades. Therefore, streaming media is best hosted on a platform dedicated to 
providing optimized media functionality without combining it with critical 
functionality such as transaction databases, home page web servers, etc. 

2.4 Bandwidth Considerations 

It is necessary to target a network connection bandwidth for delivering a successful 
streaming media presentation. The content must be created for a particular bandwidth. 
The goal is to create presentations with bit rates that do not exceed the end-to-end 
bandwidth (from the streaming media source to the client) in order to avoid loss of 
data. The target bandwidth is defined to be the maximum bandwidth available for a 
particular network connection. The total bit rate of the streaming presentation must be 
at or below the target bandwidth. The total bit rate of a presentation consists of two 
components: the maximum bit rate consumed by all streaming tracks, and a specific 
percentage, say 25% [11], of target bandwidth for overhead such as connection noise, 
data loss, and packet overhead. 

There are two approaches to ensure smooth delivery of streamed presentations for 
varying bandwidths: Some video encoders allow encoding of a single clip that targets 
multiple bandwidths [11] by including several streams at different bandwidths, and 
associated synchronization information. Such encoding enables on-the-fly switching 
to a lower bandwidth encoding under high network traffic to ensure smooth media 
delivery. The other approach is to create multiple versions of the clips for different 
bandwidths, and use a SMIL file to designate a target bandwidth for each of the 
groups when assembling the presentation. Yet another option is to provide low 
bandwidth representations of the original video based on analysis of the video 
content. These representations include storyboards, moving storyboards (slide shows) 
with and without audio, fast playbacks, etc. [2, 14]. 
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Bandwidth considerations also impact the scalability of a web application 
deploying video. Since the total bandwidth available to a media server is fixed, trade- 
offs must be made in deciding how that bandwidth is allocated. From the media 
server’s perspective, this may be optimized using a combination of mirrored sites and 
caching technology. For example, a classroom of 100 students simultaneously 
accessing the same video clip can be better served with a geographically close 
mirrored web site and caching technology, than with a mirrored web site alone. From 
the application’s perspective, it will always be possible to exceed the capacity of the 
bandwidth, despite all enhancements to optimize bandwidth on the server side. In 
general, a compromise must be made between a smaller number of high bandwidth 
connections, and a large number of low bandwidth connections. 

In summary, the following considerations must be evaluated in selecting a 
streaming media technology [10]: 

• Streaming media formats supported by browser plug-in player 

• Browser installation: plug-in versus applet 

• Availability of SDK with API access, ease of development 

• Tools for creation of rich media content 

• Support for SMIL format to provide spatio-temporal synchronization of media at 
presentation time 

• Target platform of both streaming media server and client (e.g. Linux, Windows, 
Macintosh, set-top box, etc.) 

3 Streaming Media Applications 

Today, web applications have taken on a lot more complexity than merely being 
static sources of information. They include e-business applications, distance learning 
and training, enterprise-wide planning systems, transactional systems, collaborative 
work environments, etc. The same parallel is seen in web-based video applications: 
they vary from being relatively static information sources (as in streaming short, well 
edited video clips) to increasingly complex applications such as e-business and 
transactional systems that must support highly interactive, searchable, large 
collections of media. 

In its simplest form, a single digital video may be delivered using streaming video 
technology. This means a user can play the video clip from start to end, and have 
access to VCR-like controls such as Play/Pause/ Stop/Forward/Re wind for 
browsing/viewing the video. This is often the case with video clips in broadcast news 
web sites. Such applications are adequately handled by most of the available video 
streaming client/server architectures. The video content is typically edited, and 
prepared manually to be effective, short and most appealing to the user. In contrast, an 
advanced application may be one where the streaming video is highly interactive and 
searchable, and can be of much longer (tens of minutes or more). Such applications 
are powered by video cataloging and indexing tools [2, 3, 13, 14, 16] that enable 
sophisticated search algorithms to make video searchable. Another example of an 
advanced video-based web application is one where user navigation triggers database 
search, as a result of which streaming media is dynamically composed into a web 
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page so as to display contextual media to the user. Key issues like the degree to which 
the web-based client contains intelligence, and how that intelligence was made 
available to the front end are very much dependent on the application functionality. In 
general, tradeoffs are made between desired functionality and performance. Simple 
applications that essentially provide a few short streaming video clips with standard 
VCR-like controls for playback can be architected with minimal intelligence at the 
front end. In contrast, highly interactive applications that include media search, and 
intelligent content-based video representations may involve some program logic at the 
client for performance reasons. 

3.1 Architectural Options 

We examine various architectural options for deploying video in web applications 
with progressively increasing complexity and functionality. The key issue being 
illustrated here is the extent of intelligence contained in the client, and the 
implications of this on the client's processing load and server storage/processing load. 




Figure 1 illustrates the architecture of the simplest form of streaming video on the 
web. The functionality provided is that of playing video on demand with standard 
VCR-like controls. An HTML page containing a link to a streaming media 
presentation is rendered in the browser, based on a HTTP request to the web server. A 
user click on that link causes the player plug-in to establish two connections between 
the streaming media server and the player plug-in. The first is a two-way TCP 
connection to send commands from the player to the server such as Play/Pause/Stop, 
etc. This channel also supports authentication and logging functionality on the server. 




Engineering the Web for Multimedia 83 



The second connection is a one way data channel established from the streaming 
media server to the player plug-in to stream the media. 

Figure 2 illustrates the architecture of a form based two or three-tier web 
application in the context of video [5]. A simple example is one of typing keywords 
for search in a video, and retrieval of relevant video segments with the ability to 
playback the retrieved segments. First, a standard “search” page (in HTML) is 
presented. It contains a form with fields for typing in keywords for search in a video. 
The user types in a query and clicks on the submit button. As a result, an HTTP 
request is sent to the web server, in response to which the application server is 
invoked. The application server performs the search, and generates the results HTML 
page containing links to time offsets into the video (based on the query results) to 
support playback of the relevant video segments. The client can then play a video 
segment by communicating directly with the video streaming server. 




Figure 2: Streaming Video in a Two-tier Web Application 

This example illustrates a simple instance of a more general issue, i.e. where 
should application specific intelligence be generated and reside, at the client or at the 
server. The location of the query processing to generate the time offsets is not an 
issue. This adheres to the three-tier architecture in web applications where application 
specific logic is invoked by the web server using a standard API. The issue is with 
respect to implementing playback of video with time offsets based on text query 
results. We have the following options: 

• The SMIL standard supports playback of a video with a time offset. On receipt of 

a query, the application server can generate a new SMIL file with the necessary 
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time offsets, and refer to this file in the results HTML. A user cliek on that link 
will cause the player plug-in to begin playback based on the time offset specified 
in the SMIL file. This has the disadvantage of having to create several temporary 
SMIL files for each query, their maintenance, and removal. 

• The front-end being rendered by the browser may include program logic to begin 
playback with the specified time offsets. Most streaming video servers and plug- 
ins support a limited set of Java/JavaScript APIs for implementing such 
functionality. Since HTML is primarily a markup language for content 
presentation, and not for programming logic, the application server must generate 
the necessary Java/JavaScript code to support the functionality of offset playing 
based on query results. This adds complexity to the issue of separating program 
logic from presentation logic at the application server. 

While neither option is optimal, the specifics of the application may favor one 
option over the other. 

The described example in figure 2 illustrates the simple GUI fimctionality of 
supporting offset playback based on the query results. Within the same framework, 
certain applications may require advanced GUI fimctionality with respect to video 
playback. Consider a “smart” video browsing interface that incorporates several 
representations of a given video such as a slide show, compressed audio, original 
video, etc. [2, 14] An example of advanced GUI functionality in this case may be that 
of video controls to switch viewing from one representation to another, starting from 
the time offset of the previously playing video. Current plug-in players establish 
stateless sessions where the player maintains no information about prior videos 
viewed, current position in video, etc. The only option therefore, is to use 
Java/JavaScript API to implement such functionality. Streaming video servers and 
plug-ins typically support a set of asynchronous APIs without reliable callback 
support for notification methods. This requires complex programming in JavaScript to 
provide synchronous functionality based on asynchronous API’s to maintain state 
information in the browser. Providing a robust solution with cross-browser 
compatibility given the differences in scripting language support and plug-in/ ActiveX 
support proves to be extremely challenging. Figure 3 illustrates JavaScript functions 
present at the client for providing synchronous playback based on asynchronous 
streaming video APIs. 

Another example of the tradeoffs between the processing power requirements of 
the client and storage requirements on the server is illustrated in a video browsing 
application [8]. The study describes three options for streaming content-based video 
summaries, in this case, a summary based on speedup of audio (and therefore, video) 
without change in pitch, intonation and overall quality. Such summaries may be 
precomputed and stored on the server, computed by the server on-the-fly on client 
request, or computed on the client in real time. Precomputing the summaries on the 
server requires preprocessing of the audio and storing different versions in several 
different speeds (depending on the video format, in some cases the entire video must 
be re-encoded and stored). In this case the client only needs to select the desired 
representation, i.e. playback speed. However, a synchronization issue (as discussed 
above) occurs if the client should change the speed during playback. In contrast to 
this solution, the speedup algorithm may be implemented on the client. This requires 
additional computation power on the client that is already busy with decoding and 
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function JumpToVideo (newvideo) { 

//Save Position.... 

CurntPos = document . Video . GetPosition 0 ; 
document .Video . DoStop ( ) ; 
document . Video . SetSource (newvideo) ; 
document . Video . DoPlay ( ) ; 

/ /WAIT for Play to begin 

myTimer=setTimeout ( ' Wait For PI ay ( ) ' , 100 ) ; 

} 

function WaitForPlay () { 

//if NOT yet playing, keep waiting. . . . 
if (document .Video . GetPlayState ( ) != 3) 

myTimer=setTimeout ( ' WaitForPlay ( ) ' , 100 ) ; 
else 
{ 

document . Video . DoPause ( ) ; 
document .Video . SetPosition (newpos) ; 
document . Video . DoPlay ( ) ; 



Figure 3: JavaScript Functions for Synchronous Playback 

presenting the video. It also requires higher streaming rates by the server, since both 
audio and video must be played faster than the original speed. The overall 
recommendation in this particular study was to use server storage and precomputed 
discrete speeds, rather than to overload the client with additional processing and 
synchronization associated with client control of playback speed. 



3.2 Related Issues 

So far, we have described specific considerations related to the architecture of web 
applications deploying streaming video. The web today, is used by millions as a 
primary source of information and a medium to collaborate, communicate and share 
knowledge. Information technology in the form of digital libraries has explored issues 
of organization, access, security, and distributed information sources on the web [12]. 
This brings out several other issues that must be considered when deploying video in 
real-world operational systems: 

• Separation of second tier (application server with search engine), and third-tier 
(metadata related to media) from the actual storage and delivery systems that 
serve the media. The storage and delivery systems may be separate on dedicated 
servers placed in selected geographical locations or centers with high bandwidth 
connections. They may also be replicated to ensure better performance. As an 
example, there exists technology to provide dependable, high-performance 
delivery of streaming media using sophisticated optimization algorithms for 
content delivery [1]. 
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• Interoperability: Today, the choice of video formats prevents interoperability 
among heterogeneous repositories distributed on the web. 

• Similarly, the lack of standards for metadata descriptions prevents the use of 
common search engines across various video databases. The emerging MPEG-7 
standard for audiovisual data attempts to address this problem [15]. 

• Media Search Engines: As video search technology matures, a unified 
architecture is needed to accommodate different search engines specific to video. 

• Media content is typically associated with copyright issues that may influence the 
way in which search engines work on media. For example, the composition of 
video summaries from the original video by anyone other than the content owner 
may be prevented by legal copyright requirements in certain media industries. 

• Mirrored web sites: While the use of distributed mirrors increase reliability, 
accessibility and capacity, it introduces issues of keeping the mirror sites 
synchronized, and user state synchronization. This is further complicated by the 
separation of the second and third-tier from the streaming media servers which 
stores and delivers the video. As with HTTP, the best strategy to address 
scalability is to deploy caching media servers at client sites. 



4 Case Study: CueVideo 

In the CueVideo [2, 14] project, we combine computer vision techniques for video 
content analysis, and speech recognition for spoken document retrieval, together with 
a user interface designed for rapid filtering and comprehension of video content. We 
are working on different applications that deploy this technology using a web 
infrastructure such as distributed learning, just-in-time training, e-business and 
knowledge mining. As an application, we would categorize this as a fairly advanced 
web-based video application which allows searching of video collections, playback of 
relevant video segments, ability to start playback of a new representation of the video- 
based on current playback position, etc. Our current requirements target an intranet 
environment with a few hundreds of videos in the collection. Examples of typical 
content include video recordings of seminars, technical presentations, training tapes, 
on-line classes, etc. The anticipated number of simultaneous users accessing 
streaming video is around 25. 

4.1 Requirements 

Our general set of requirements related to web applications deploying video were as 
follows: 

• Use of standard web technology, with particular emphasis on using standard 
technology on the client-side. Our solution was not to mandate specialized client 
setup or technology other than the media player in order to appeal to a large user 
community. 

• Selection of a widely-used video format for streaming compressed video on the 
web. Since the quality of web-based video is rapidly improving, the selection of 
a format that is able to capitalize on these improvements was important. 
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• Video players for the client side that support immediate playback without waiting 
for entire media file to load, including offset playback to support retrieval of 
application driven segments. We also needed advanced player functionality to 
support switching of video from one source to another based on user action. 

• Video encoding tools that support standard video formats as inputs and generate 
standard compressed streaming formats. Accepting input from other standard 
formats allows an encoder to take maximum advantage of existing repositories of 
other media types (e.g., MPEG-1, AVI) Additionally, it was necessary for the 
encoding tools to offer either API access or command line access for automated 
batch processing. 

• As an alternative to converting existing media to a streaming format using video 
encoding tools, we wanted the option of developing extensions to the base 
streaming server/player to support new media formats. 



4.2 Architectural Solution 

We made the following choices to meet our requirements: 

• HTTP based HTML applications with a browser plug-in to support video 
playback functionality in a standard web two-tier application. In addition, we also 
use Dynamic HTML (DHTML) to support some of the advanced video playback 
functionality. The three main technologies that make up DHTML are HTML, 
JavaScript and Cascading Style Sheets (CSS). HTML is used for the basic 
structure of the document, JavaScript to manipulate the Document Object Model 
(DOM), and Cascading Style Sheets (CSS) to define the presentation and style of 
the document. As a result, the new tags supported in HTML trigger additional 
JavaScript events enabling more control over the content being rendered. 

• Thin client model with CGI application layer that allows users to query the 
videos using different search modes. The advanced video browsing content was 
precomputed and streamed from the server. However, video control logic such as 
time offsets, names of appropriate video source files, etc. was generated by the 
CGI program and encapsulated in DHTML being rendered at the client. 

• Streaming video solution using RealNetworks [11]. RealServer has an 

extensible, component based architecture that allows add-on support for 
additional media formats such as MPEG-1. The client video playback support is 
via RealPlayer plug-in. The RealProducer encoding tools enable the creation 
RealMedia content from a variety of input sources, including live feed, AVI and 
MPEG-1. It also supports command line processing and API access, allowing 
smooth integration into applications. 

4.3 TradeOffs/Lessons 

The tradeoffs associated with our choice of tools and methods were as follows: 

• The advantages of HTML are the centralized maintenance, cross-platform nature, 
simple and consistent user interface, and scalability offered due to caching of 
HTTP clients. The disadvantage of HTTP based services is that HTTP is stateless 
and does not support server side initiated communications. Further, HTML 
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supports only simple user interfaces while advanced GUIs need a richer widget 
set. Our choice of DHTML to provide additional video player functionality has 
its disadvantages. DHTML is not yet stable enough to use in mission critical web 
pages, and cross-browser compatibility due to differences in the document object 
model between browsers are a big cause for concern. We had to craft our web 
pages for the different platforms we targeted, in particular Netscape Navigator 
and Microsoft Internet Explorer for Windows and Macintosh. 

• Streaming video is primarily supported with the use of native code using plug-ins 
for browsers. We chose this option over the Java applet option which supported 
proprietary formats, consisted of limited functionality, and higher development 
costs. As a result, clients have to incur the disadvantage of installation of plug-in 
video players. 

• The tradeoffs associated with including video control logic in a dynamically 
generated HTML page have to do with function and performance. The video 
control logic in response to user input is generated as JavaScript functions. This 
contributes to about 6Kb of code that must be served using HTTP for each 
dynamically generated page. Given the available intranet bandwidth, this 
overhead was reasonable for the additional functionality gained. 

• The tradeoff between quality and bandwidth is a result of the encoding 
technology, and the method used for bandwidth selection during streaming. For 
example, the RealNetworks streaming server incorporates the concept of stalling 
under low bandwidth conditions. That is, when the client does not have 
sufficient data for playback, the server stops streaming until such time that the 
client buffers enough data to continue playing smoothly. This feature coupled 
with seamless switching to a lower bandwidth encoding under low bandwidth 
(SureStream technology) results in relatively smooth streaming media 
presentations under unpredictable network traffic conditions. We compromised 
on the quality of the video with the lower bandwidth encoding, in exchange for 
the smooth playback functionality without loss of data. 

5 Conclusions 

In this paper, we have highlighted the architectural considerations for developing web 
applications deploying video. While there is no clear right answer, we have pointed 
out the pragmatic issues that must be considered and the options associated with each 
architectural decision given the available technology. Advanced web-based video 
applications typically require some amount of computation and player control beyond 
streaming. In general, the thin client model is favored where the intelligence, and 
therefore, the processing requirements are at the streaming server. Intelligent 
interfaces that accompany such applications are rendered at the client, but typically 
composed at the server by the application. This is a trend, however, the specific goals 
of any project will dictate the final choice of tools and architecture in deploying a 
solution. 
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Abstract. As hyperdocuments grow and offer more and more contents 
and services, some of them become more sensitive and should only be 
accessed by very specific users. Moreover, hypermedia applications can 
offer different views and manipulation abilities to different users, 
depending on the role they play in a particular context. Such security 
requirements have to be integrated into the development process in such 
a way that what is understood by a proper and safe manipulation of a 
hyperdocument has to be analysed, specified and implemented using 
the appropriate abstractions. In this paper we present a high-level 
security model applied to the modelling of security policies using 
components and services belonging to the hypermedia domain. The 
model uses negative ACEs and context-dependent user permissions for 
the specification of security rules. An example of its use for the design 
and operation of a web-based magazine is also described. 



1 Introduction 

The increasing popularity of multi-user hypermedia systems, chiefly represented by 
the proliferation of web-based applications, has put stress on the need for preserving 
the security of the information they hold. Indeed, security is a basic requirement of 
any multi-user information system where different users have different needs and 
responsibilities that determine their ability to access information. To better understand 
this assumption it has to be taken into account that security is not only related to 
confidentiality or privacy, as commonly thought, but also to integrity and availability 
[1]. While confidentiality is aimed at preventing information disclosure, integrity is 
concerned with mechanisms to avoid the improper modification of information [2] 
and availability deals with the ability to perform operations [3]. While the first two 
features can be defined in terms of precise and persistent properties and, therefore, 
rules to guarantee confidentiality and integrity can be specified; availability, which is 
aimed at avoiding information and resources withholding, involves many factors (e.g. 
computer accessibility) that go beyond the limits of any information access policy [4]. 
Eor that reason, we will use the term security to refer to confidentiality and integrity. 
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or what is often called computer security [4] , and we will not grapple with availability 
problems. 

As hyperdocuments grow and offer more and more contents and services, some of 
them become more sensitive than others and they should only be accessed by very 
specific users. In fact, a great number of web sites provide on-line support for 
management activities that are only allowed to a group of authorized users. However, 
hypermedia applications can support more advanced security mechanisms than 
merely denying/allowing access policies. Indeed, hyperdocuments can provide means 
to manage the assignment of different manipulation abilities or permissions to their 
users depending on the role they play in a particular context, implementing what is 
usually known in the literature as RBAC (Role Based Access Control) [5]. Regardless 
of which access control mechanism is adopted, an integrated and mature software 
engineering process cannot postpone security requirements to the implementation 
phase. On the contrary, security problems, as any other software requirement, have to 
be integrated into the whole development process [6; 7] in such a way that what is 
understood by a proper and safe manipulation of a hyperdocument has to be analysed, 
specified and implemented using the appropriate abstractions. In this sense, high-level 
security models are used to formalise security policies using components and services 
belonging to the hypermedia domain [8]. The same than security models for databases 
propose mechanisms to specify manipulation rules based on the classification of 
tables, tuples and transactions; high-level security models for hypermedia have to be 
based on elements such as nodes, links, contents and so on. This kind of models make 
easier and more consistent the specification of security rules, since designers do not 
lose the hyperdocument context when facing security requirements. Examples of 
security rules that could be specified with a high-level model are to establish which 
contents should be delivered to which users or roles, who can modify the structure of 
the hyperdocument, who can activate a link, who can personalise items or which 
constraints have to applied when creating or modifying a link. 

In this paper we present a high-level security model for hypermedia, that provides 
elements to define security policies, and its application to the specification of the 
security rules of a web-based application. With this purpose, we will first survey this 
model in section 2, then we will present in section 3 an example of usage in a web- 
based application. Section 4 surveys some proposals about hypermedia security and 
briefly discusses the most relevant features of the model here presented. Finally, 
section 5 outlines some conclusions derived from the development and usage of the 
model. 



2 A Model to Specify Security Rules for Hypermedia 

The high-level security model [9] is aimed at providing hypermedia designers with 
tools to specify security rules, taking it for granted that such rules are devoted to 
ensuring a proper manipulation of hyperdocuments, whether implemented or not as 
web-based applications. In the next subsection we will first introduce the security 
principles assumed and then we will describe the model itself. 
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2.1 Security Principles Underlying the Model 

This multilevel security model uses context-dependent user clearances and negative 
ACLs to preserve information security. It translates Denning's axioms [10] from an 
information flow point of view to an information manipulation perspective. Instead of 
using security categories to establish can-flow relationships between objects and 
subjects, mainly thought to safeguarding applications against unauthorized disclosure 
of information, we will apply such security labels to establish can-perform 
relationships between objects and subjects/actions with a view to avoid improper 
manipulation activities. The model also assumes the following security principles: 

- Well-formed transactions. Users can only manipulate information through a 
number of restricted and controlled programs [11]. Such programs are the 
operations of the model (see table 1). Each operation includes a security check (the 
transition function) to decide whether the operation can or not be allowed. 

- Authenticated users. Only authorized users can perform operations [2]. 

- Least privilege. Operations have only those privileges they require to accomplish 
their objectives [11] and users are granted only the abilities they need to do their 
work [2]. 

- Delegation of authority. Security management is not only centralised in a security 
manager, but those actions which are not too critical can be delegated to the 
author's side [2]. In particular, the model includes a confidentiality ACL which 
can be controlled by the owners of the objects. 

- Positive and negative authorizations [12]. To ease the security management tasks, 
models have to support both positive authorizations which give access to some 
information items or services as well as negative authorizations that deny access to 
particular users. This feature is particularly important when the number of users 
tends to be unmanageable, as it happens in most web-based applications. 

- Data abstraction: security labels or categories are defined in terms of manipulation 
abilities pertaining to the domain of application. 



2.2 Specification of the Security Model 

The security model offers a number of elements (see table 1) to establish the rules that 
ensure a proper manipulation of the hyperdocument. 

To define what is proper or not, designers have to specify who can or cannot do 
what and, therefore, the first step is to organise the application domain into subjects 
and objects as it has been done in classical information flow security models [10; 13; 
14], where subjects are active entities that can perform actions on objects. Then, we 
have to define what can be done, that is, which operatious can be performed on the 
hyperdocument components, and which kinds of manipulation abilities we have in our 
application, that we call security categories. As we assume the least privilege 
principle (see section 2.1) we use the classificatiou of operations to specify which 
kind of manipulation activities they involve and the classification of objects to 
determine which is the most permissive kind of action an object can undergo. The 
security policy is then defined by means of the confidentiality and clearance 
relationships: the former is used to maintain negative ACLs for each object whereas 
the second makes possible to specify context-dependent user permissions. Linally, 
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there is a transition function responsible for analysing all these elements to 
determine if an operation is or not safe. All these elements are discussed above. 



Table 1. Specification of the security model 



Element 


Specification 


Subjects 


S={si 1 i=l,.., n n e N} 


Objects 


0={oj 1 j=l,.., m m e N} 


Operations 


Op={opi|i=l,.., ppe N) 


Security Categories 


Sc=(sci 1 i=l,.., 3 , scj.ic scj, V scj e Sc} 


Classification of operations 


CO: Op — > Sc 


Classification of objects 


5: O ^ Sc 


Confidentiality 


V: O ^ S" 


Clearance 


([): O X S^ Sc 


Transition 


0: Op X O" X S^ 0™, n, m e N 



Subjects. The set of subjects contains all the users of the application since they can 
initiate operations. Users can be organised in groups or in roles depending on the 
security policy that is being specified. In particular, roles represent job functions or 
responsibilities detected in the domain of application and are used to specify RBAC 
policies. Programs or scripts could also be considered as subjects, but as they always 
operate on behalf of a specific user they assume the same security constraints than 
those that apply to the subject who activated them. For example, if a user opens a web 
page containing an applet, the applet is executed only if the user can accomplish the 
actions included in the applet. 

Objects. All the components of a hypermedia application are liable to be considered 
as objects, as far as there are operations to retrieve, edit or personalise them. Even 
subjects can be objects [3] since they can receive the effects of an operation, but not 
conversely. Typical objects in a hyperdocument can be nodes, contents, links, 
anchors, programs/scripts or attributes, although the specific list of plausible objects 
depends on the hypermedia model assumed. In any case, not all them are big enough 
to be considered as objects, specially taking into account that each object has to be 
classified and assigned a n-ACL and a list of clearances (see below) to define the 
security rules and, consequently, designers are responsible for defining a reasonable 
and manageable granularity. Our recommendation is to keep nodes, contents and 
subjects as the unique objects in the system for the reasons we expose below. 

Nodes, as abstract containers of information, and contents, as information items 
(e.g. a text, an image or a program), have both existence and meaning by themselves 
and are representative enough to be included in the set of objects. Treating them as 
different elements makes possible to define multilevel policies where different users 
can get different views of the same nodes as in [15]. Figure 1, shows an example of 
such multilevel security. In the figure, the hyperdocument is based on a two-level 
architecture where information is delivered by a module, the application manager, 
taking into account the users requests and the security rules. Thus, Userl and 
User2 obtain a distinct result from their requests to visualise nodel since User2 
cannot see one of the contents tied to the node (Content 2). 





94 Paloma Diaz et al. 



User 1 



User 2 



Content 1 Content 1 

Content 3^^^^^^^lcoritent3 



node 1 node 1 




Fig. 1. Multilevel security for hyperdocuments 



Composition mechanisms applied to both nodes and contents can be supported to 
model more sophisticated structures and to make easier the security management. In 
particular, we consider the use of two abstraction mechanisms: generalisation and 
aggregation [16]. On the basis of this composition mechanism, it is introduced the 
concept of domain, that allows the hierarchical structure starting from an object to be 
used as an object itself. A domain of an object o contains the object, the domains of 
the objects o’ aggregated by o and the domains of the objects o" generalised by o (see 
table 2). For each hyperdocument, there is a root domain that represents the 
hyperdocument itself. 

Table 2. Definition of a domain 

domain (o)= {oj u domain (o') u domain (o"), Vo', o": 

o' e aggregated (o) and o" e generalised (o) 

Vo e O, aggregated (o)=list of objects aggregated by o 
Vo e O, generalised (o)=list of objects generalised by o 
Hyperdocument = root domain | domain(Hyperdocument)=0 

Concerning subjects, they have to be considered as objects insofar as operations to 
manage them are provided by the hyperdocument but when the users management is 
responsibility of the operating system, they are not included in this set. 

It is obvious that too fine-grained elements, such as attributes, should inherit the 
security rules defined for the elements they are tied to, but some controversy can arise 
around anchors and links. We consider there are two types of anchors: declarative and 
procedural. Declarative anchors are those that are embedded into a specific area of a 
node or content and, in such case, the same rules that apply to object the anchor refers 
to apply also to the anchor. A procedural anchor is specified by means of a rule, 
which establish how to calculate where the anchor has to be embedded. In any case, 
the anchor will be placed into an object that belongs to the root domain, and therefore 
procedurally defined anchors inherit the rules that apply to the hyperdocument. If a 
subject can modify the hyperdocument she can also create a procedural anchor, 
although when the procedure is executed only valid anchors that do not violate any 
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security rule are actually created. Thus, User2 in Figure 1 will never be able to 
embed an anchor into Content 2 . Even if she finds a rule that resolves into such 
content, the anchor will not be accepted as valid since that user is explicitly denied 
access. Concerning links, they can also be specified in a declarative or a procedural 
way. In the first case, a link is a relation among two set of anchors and they are 
managed assuming the most restrictive among the rules that apply to its sources and 
targets. Thus a declarative link can be deleted only when all its targets and sources 
can be deleted. Procedural links are treated the same than procedural anchors. 
Operations. Within this set they are included all the actions that can be performed on 
the hyperdocument, including the retrieval, edition and personalisation of the different 
components of the hyperdocument. The rules that are applied to establish who can 
perform such operations are stated in the definition of the confidentiality and 
clearance relationships (see below). 

As said before, programs are not considered as operations but as contents that in 
turn invoke some operations. Moreover, those actions concerning the management of 
security issues (such as updating users, assigning categories and clearances) should 
not be included in this group since they are critical services whose management can 
compromise the system security and, consequently, they should only be allowed to 
the security manager. The security manager can be implemented as a unique security 
officer, as a group of security officers or as an administrative role depending the 
security policy. In any case, we suggest to keep the security manager within a very 
restricted group of users, that should not include all authors of a hyperdocument, since 
their responsibilities are critical to ensure the hyperdocument safety. The set of 
operations should not include either operations concerning external services (e.g. mail 
or ftp) that are not controlled by the hyperdocument. 

Security Categories. They are abstract permissions that represent types of 
manipulation activities supported by the application. Security categories make up the 
basis to define can-perform relationships between objects and subjects that also 
involve the actions to be executed. They are defined as a finite set of elements 
accomplishing a partial order relation where each category adds privileges for 
manipulating information to the previous one. 

We propose the use of the three security categories to distinguish among three 
types of basic activities: the ability to retrieve information (Browsing category); the 
personalisation of elements (Personalising category); and the modification of 
elements, whether to update, create or delete them (Editing category). To create new 
elements, whether nodes, contents, links or whatsoever, users require an Editing 
category for the domain they are working on. 

Classification of operations. This mechanism categorises operations according to the 
minimum security category required to execute them. If an operation involves several 
objects, it has to be decomposed into a set of atomic actions, each one affecting only 
to one specific kind of object, and the higher manipulation ability required is chosen 
as the operation category. This mechanism is used to implement the least privilege 
principle assigning to each operation the permissions it exactly requires to accomplish 
its objectives. 

Classification of objects. It is used to define the minimum manipulation ability that 
can be requested to a subject or operation to deal with this object. This mechanism 
offer more flexibility to define security policies since it makes possible to protect 
objects from specific manipulation abilities irrespective of who is involved in the 
action. 
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Confidentiality. It is a discretionary mechanism to state which subjects can not even 
retrieve a particular object. This confidentiality control is done through a negative 
Access Control List or n-ACL using a typical access matrix model [17] that is 
managed in a decentralised way by the owners of the objects. It is important to remark 
that users, as objects, can have their own n-ACL but its manipulation is a high risk 
activity that should be carefully controlled to avoid improper and malicious 
modifications of the users identity or properties. For that reason, only a security 
manager should be able to modify such kind of information. 

Clearance. This is a mandatory mechanism that is used to specify which are the 
manipulation abilities of the subjects for each object or domain. The actions each 
subject can do on a hyperdocument depend on the role she plays. Moreover, subjects 
can have different responsibilities on different parts of the hyperdocument and, 
therefore, their manipulation abilities are not unique but context-dependent. Thus, the 
model assigns to each subject a clearance or manipulation ability for each object in 
the hyperdocument. 

To simplify management tasks, it can be assumed that whenever a new object is 
created its author is granted an "Editing" category and all the subjects who are not 
included in its n-ACL are directly granted a browsing category. Conversely, those 
subjects who have not a clearance for an object should be directly added to the n-ACL 
of the object. In this way, both negative and positive authorizations can be 
implemented, so that the security manager will use the most appropriate one 
depending on the number of subjects who have to be denied/allowed access. 

Designers have to pay special attention to the assignment of clearances to 
manipulate highly sensitive objects, such as the users. The management of users has 
different slopes. On the one hand, there are less sensitive information items, such as 
the user name or address, that can be considered as private but whose modification is 
not likely to threat the hyperdocument security. In such cases, the user herself as well 
as the security manager should be allowed to update them. On the other hand, there 
are other operations that, even though they affect to the user, they should only be 
managed by the security manager since they can compromise the system security. 
This is the case of the subject clearances that make up the security policy aimed at 
preventing information integrity or the management of new users, groups or roles. 
Transition. It ensures that only operations that do not compromise the system security 
can be executed. With this purpose, it applies the algorithm shown in Figure 2. 
Depending on how it is implemented, some of these controls (in particular 2 and 3) 
can be done at the same time. 

This algorithm has to be executed each time an operation of the Op set is invoked 
in order to decide whether the operation can be performed or not. Thus, operations are 
transactions only initiated once the transition function has decided that all the required 
actions satisfy the security policy defined for the hyperdocument. 



3 Applying the Model for a Web-Based Application 

In this section we will show a practical example on how security requirements are 
integrated into the development process by describing the experience carried out in an 
web-based magazine. The model presented in the previous section is general and can 
be used to specify security policies in terms of elements and abstractions pertaining to 
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the hypermedia domain. Then it can he used to define security rules for 
hyperdocuments, whether implemented or not as a web-hased application. In fact, 
specific recommendations on how to apply this model into the web environment can 
be found in [18]. 




Fig. 2. Transition function algorithm 



3.1 Description of the Magazine 

"Turismo y viajes" is a Spanish magazine about travelling published in the web by a 
private company. This monthly magazine includes articles, interviews and a number 
of sections that provide access to updated information as well as to already published 
documents. In particular, articles and documents are organised considering different 
criteria that can be useful to find the most appropriate destination for each user. Thus, 
traditional users looking for information on a particular country or region can browse 
sections about countries whereas users who look for some added value into their 
travelling experiences can use sections such as youth, culture, gastronomy, routes or 
ecology to find those destinations suiting their expectations. Additional sections are 
provided to look for advertisements and special offers, related links and tourist 
information offices or information about grants and courses. 

This magazine is accessed by different kinds of users, including the journalists 
working on the articles and sections of the magazine, travel agencies and an 
heterogeneous group of readers. At this moment, the web version only provides 
browsing capabilities to the readers and the management of the web pages is 
responsibility of the computing section. To speed up the publication process, the 
company intends to move the magazine management into the web environment, so 
that journalists and even travel agencies can put their information by themselves 
instead of depending on the computer engineers. Such functionality requires 
mechanisms to provide different access abilities to different users and with this 
purpose a prototype was implemented using the high-level model of security 
described in section 2. In particular, the model was applied in the section about 
countries as it is described below. 
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3.2 Specifying the Security Policy 

The section that was chosen to use the security model includes information about 
different countries that are organised in continents. The structure of a page is shown 
in figure 3. The visualisation area is divided into four parts: the "Navigation tools" 
part placed at the top includes the icons of the different sections in the magazine and 
this is shared by all the pages in the publication; the "Continents" contains a 
dynamically created list with the available continents; the "Countries" area includes a 
dynamically created list of the countries of the selected continent for which some 
information can be provided; and the fourth one represents the information delivery 
area. For each country the same items are defined, including official name, 
population, extension, currency, travelling information (how to go, custom regulations 
and so on), public transportation, accommodation, weather, tourist information 
(tourist offices, shopping, restaurants, etc.) and other useful information. 

To apply the security model, the designers had to identify first which are the 
subjects, objects and operations. The subjects in this hyperdocument are registered 
users who, in the current version, cannot be grouped. The objects are the nodes and 
contents represented in Figure 3 as rectangles and rounded rectangles respectively. 
The set of operations is quite restricted and only includes the edition and 
personalisation of textual contents. 

Once these sets were defined, the security policy can be specified. This task is 
responsibility of a security manager that at any moment can classify the objects and 
grant manipulation abilities to registered subjects. 

Figure 3 provides a graphical example of the objects classification. As it can be 
seen, most of them can be edited since the magazine is continuously being 
incremented with more information while other items are considered not likely to 
change once they are defined and, therefore, they are defined as "browsing" such as 
the official name of a country. This classification is actually done through an interface 
similar to those shown in Figure 4. 




Fig. 3. Classifying the objects using the conceptual structure of the hyperdocument 
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Fig. 4. Assigning security categories to the objects, (a) Security categories of nodes, 
(b) Security categories of contents. 

To define the manipulation abilities of registered subjects, the same interface 
combines the confidentiality (the option "Sin acceso") and clearance relationships (the 
options "Navegar", "Personalizar" y "Editar"). The security manager has to define for 
each user what can she do with each node (continent or country) and with each 
content (the information items provided for the countries). Figure 4 shows two screen 
snapshots of the interface for these services. As it can be seen, the user "Paloma" is 
assigned a security category for a number of countries (Figure 4. a) and for each 
information item (Figure 4.b), assignments that can be updated by the security 
manager during the system operation to better meet the security requirements. 



3.3 Implementing the Security Mechanisms 

The implementation follows an scheme similar to the one shown in figure 1. 
Information items are held in a database over which an application manager is built to 
create the html page corresponding to each user request that depends not only on the 
user manipulation abilities but also on the security categories of the involved objects. 
Thus, html pages are dynamically created taking into account the security rules 
specified by the security manager (see Figure 5). 
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Fig. 5. Page dynamically created applying the security rules 

In particular, each page is composed as follows: 

- The contents that can be retrieved by the user are included. In the example shown 
in the figure 5, some information items of the countries such as the "Superficie" 
(area), "Capital" (capital) or "Idioma" (language), for which the user is explicitly 
denied access (see Figure 4.b), are not presented when the users opens the page 
about Spain. 

- The links whose target can be retrieved by the user are created and shown. In this 
case, the list of European countries (see the left area in the screen of Figure 5) does 
not include a link for Germany ("Alemania" in Spanish) since that user cannot 
access it (see Figure 4. a). 

- Additional links to open personalising and editing tools are placed behind the 
corresponding object when required. In the figure 5, that contents for which the 
user is granted a "Personalising" category and which can undergo such kind of 
operation, such as "Clima (temperatura)" (Weather) or "Transportes" 
(transportation) have a link labelled as "Personalizar" to open the personalisation 
tool. The content "Nombre oficial" has a "Browsing" category (see figure 3) and, 
since it can not be modified, no link is provided. Finally, the user is also granted an 
"Editing" category for Spain (see figure 4. a) but since there are no items that can 
be edited, the link to the edition tool does not appear. However, since some items 
can be personalised two links are provided: one to open the personalistation tool 
and modify all the items ("Personalizar") and the other ("Ver basico") to come 
back to the basic version without the personalisations. Since this prototype does 
not support operations at the node level (e.g. adding a country or a content into a 
country), clearances assigned to the nodes refer to its contents, which is not exactly 
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the objective of the model. The original idea is to use the node clearance to decide 
if it can be or not modified setting new links, adding contents or modifying its 
properties. 



4 Discussion 

Security is a key requirement of any multi-user application and, therefore, rules to 
guarantee a safe and proper use of the system have to be formally defined using high- 
level security models. Since current hyperdocuments tend to be implemented into 
multi-user environments security should be considered as a key requirements, but 
some authors [8; 19] believe that this high-level hypermedia security has not received 
enough attention. Indeed, reference hypermedia models deal with security problems 
following quite different approaches. 

Hypermedia security has been frequently considered as an issue concerning to the 
storage level [20] and some hypermedia models like [21; 22; 23] do not cope with this 
kind of problems. Other models like [24; 25; 26] propose discretionary mechanisms 
that allow the owner of a node or link to restrict its access. In this case, only 
confidentiality is faced while integrity is not considered. Moreover, such discretionary 
access control is not suitable for applications where security is of main concern [27], 
as it happens to be in most multi-user applications. There are also works that support 
the definition of more strict mandatory policies, including the multilevel security 
model proposed by Thuraisingham [15], based on a basic hypertext- oriented model 
whose security levels are defined on the basis of the typical privacy levels 
(unclassified, secret and top secret). Instead of using privacy levels concerned with 
confidentiality, what is needed is to define manipulation abilities related to integrity 
as in [28]. In this case, three authorization levels are considered; browsing, to retrieve 
information; authoring, to modify objects; and usage, to include new objects into the 
nodes. These authorizations do not take into account an essential operation of 
hypermedia systems: the personalisation of the environment. Moreover, this proposal 
does not allow to apply negative authorizations, that is to assume that users can access 
information unless the opposite is explicitly specified, and such negative 
authorizations can be very useful if there are very few users who can not even browse 
a node or content. 

The high-level model presented in this paper is aimed at safeguarding information 
confidentiality and integrity. With this purpose it offers the following mechanisms: 

- a negative access control list used to safeguard information confidentiality which is 

defined as a discretionary and decentralised mechanism; 

- a multilevel mandatory policy based on context-dependent user clearances which is 

used to maintain information integrity, and 

- a transition function responsible for executing only safe operations. 

Security categories ("Browsing", "Personalising" and "Editing") make up the basis 
of this model that is assumed by the Labyrinth hypermedia model [16; 29] and is 
integrated into a design method called Ariadne [30] that offers a set of integrated 
products used to specify structural, browsing, procedural, multimedia and security 
features. Although the set of categories can be extended to provide some intermediate 
or higher categories, we consider that the lower bound has to be always maintained as 
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the ability to browse an object (i.e. the "Browsing” category), since it makes possible 
to respect the principle of negative and positive authorizations. If the number of users 
who cannot access an object is reduced, negative authorizations are defined using n- 
ACLs, whereas if it is too big the users who can access the object are specified 
through the clearance relationship in such a way that if a user has no clearance 
assigned it is assumed that she cannot access the object. 

Some controversy can also arise around the centralised or decentralised 
management of the security rules. We assume only actions which are not too critical 
can be allowed to the users and this decision depends on the hyperdocument. In 
general terms, the management of the clearance function should be responsibility of 
the hyperdocument manager while the confidentiality control can be delegated to the 
objects owners in those hyperdocument that do not contain secret information. To 
provide users with spaces where they can decide who can do what, personalisations 
for users and groups as in the Labyrinth hypermedia model can be implemented. Such 
private hyperdocuments can be directly managed by their owners. 



5 Conclusions 

In this paper we presented a high-level security model providing mechanisms to 
specify the security rules that will govern the usage of a hyperdocument.We did show 
an example of the model usage where it can be seen that designers use the same 
semantic and contextual elements for the design of the structure and navigation 
services than for the specification of security issues. However, the example was quite 
restricted and the model has to be implemented in more complex applications, 
particularly in hyperdocuments providing a more complete set of manipulation 
operations to analyse if its implementation is really efficient. Moreover, its 
application in different applications can give place to some improvements in model. 
In particular, we are analysing the inclusion of more security categories or classes to 
gather more activities as well as in the definition of groups and roles and how this 
affects to the model in terms of authorization propagation. 
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Abstract. Web-based Information System (WIS) engineering is more 
complex than traditional Information System (IS) engineering in that it 
raises many new issues such as presentation issues, user profiling, 
navigation support etc . . . 

This paper presents a method - a set of product models along with 
process models for the development of WIS. 

This method adds the dimension of user modeling and customization to 
Web engineering. By capturing the user profiles, the designer is able to 
define user categories and to tune the presentation of the WIS content 
according to the specificity of the user. Besides, by capturing the user 
goals, he is able to define guidelines for navigating in the Hyperspace in 
order to optimize the satisfaction of the user needs. 

Finally, the proposed approach considers, as it does in much traditional 
software approaches, three different levels of abstraction (conceptual, 
logical and physical). It also clearly separates the management of 
potential users data, content design, navigational design and interface 
design. Such separation of concerns facilitates many maintenance tasks 
and leads to a higher degree of independence and flexibility. 

Keywords: Web Information System; User Goal-centered Web 
Engineering; Methodology 



1 Introduction 

Because of the rapid development of the Web technology and of the increasing 
interest of users and developers, the notion of a Web site is moving from a set of 
HTML pages to Web-based Information System (WIS). A WIS is an Information 
System providing facilities to access complex data and interactive services through 
the Web. E-business applications. Intranet systems, CRM and supply chain 
applications are examples of WIS. 

Despite this rapid evolution, WIS development is essentially ‘ad hoc’. Developers 
often consider Web development as a media manipulation and presentation creation 
rather than traditional IS development. Thus, WIS development does not follow the 
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well established engineering principles and consequently, it is difficult to ensure the 
quality of the resulting product. 

Another important factor is that the Web users’ community is very heterogeneous. 
Many of the so-called ‘Internauts’ navigating on a site might have different goals 
different backgrounds and knowledge. This leads to usability problems such as "loss 
in Hyperspace" and "cognitive overload" which have been reported in the literature. 

WIS engineering differs from IS engineering in that it involves many new issues 
such as navigation support, presentation issues, user profiling, dynamic adaptation of 
the format and informational content presented to a user etc. Over all, there is clearly 
a need for developing WIS development method that can provide a disciplined way- 
of-working to ensure quality WIS development. There are some concerns in the 
literature about the problems that can occur if WIS development remains ‘ad-hoc’ [1]. 
[2] [3]. 

The work presented in this paper is an effort towards this direction. It aims at 
providing a methodological approach to WIS engineering. This approach is user 
centered and goal driven, its key aim is to adapt the information provided and the 
services offered by a WIS to the individual needs and preferences of its users. 

Alike most traditional software approaches, our approach offers three different 
levels of abstraction dealing first with the conceptual level where a conceptual 
navigation schema of the WIS is created, then, the conceptual navigation schema is 
transformed in a set of logical pages and links, which enables the actual 
implementation of the WIS in the chosen environment. 

Finally, we allow clean separation of potential users and goal modeling, content 
design, navigational design and interface design. Such separation of concerns 
facilitates many maintenance tasks, and leads to a higher degree of independence and 
flexibility. 

The different concepts presented are illustrated by excerpts of the well known 
library domain case. 

The remainder of the paper is organized as follows. Section 2 gives a general 
overview of the proposed approach. Section 3 and 4 describe respectively the User 
Goals Modeling step, the Users Modeling step, and the Web Navigation Conceptual 
Modeling step. Section 5 relates our approach to some relevant previous works. 
Section 6 provides conclusions, some implementation issues and a brief overview of 
future works. 



2 General Overview of the Proposed Approach 

The proposed method is defined as a couple composed of a set of specific data models 
(also called product models) and of process models. The process models represent 
methodological guidelines including a number of different steps which define the way 
a Web Application Engineer has to follow to create and / or to transform the elements 
of the intended product. The data models provide the concepts which permit to 
describe the products. 

Every process model refers to a product model, for example the process of 
designing the navigation structure is based on the navigation conceptual model. 
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2.1 The Product Models 

As shown in Figure 1, the method considers the creation of a WIS according to four 
perspectives : 

- the management of the informational contents, 

- the definition of the navigational structure, 

- the definition of the user interface and, 

- the identification and description of potential users and of their goals. 




Each of these views is supported by appropriate product models which allows the 
WIS engineer to design the system considering each view in isolation from the others: 

• Models that support the Users & users’ Goals view capture (a) information about 
the potential users such as their background, knowledge, preferences etc... in order 
to define user categories and (b) usage goals of these potential users. 

• Models related to the Information view are used to model the WIS information 
contents, i.e. the domain knowledge that the WIS will store and return to its users. 

• Models that support the Navigation view deals with the navigation structure of the 
WIS informational contents. They help structuring the Hyperspace in a net of 
nodes and links among them. 

• Finally, models associated to the Interface view deal with the formatting of pages 
associated to the navigation structure. 

Obviously, traditional IS development is mainly concerned with the second view 
and, to a certain extend, with the fourth one. Besides, whereas the last three views are 
mentioned in a number of WIS development research projects [4], the first one is an 
original contribution of the proposed method. By capturing the user profiles, the 
method is able to define user categories and to tune the presentation of the WIS 
contents according to the specificity of the user profile. Besides, by capturing the user 
goals, the method is able to define guidelines for navigating in the Hyperspace to 
optimize the satisfaction of the user goal. 
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It shall be noticed that identifying the four perspectives help mastering the 
complexity of a WIS development. There is some independence between the four 
perspectives, that allows the WIS engineer to model a view in isolation from the 
others. In addition, the WIS engineer can define several navigational structures for a 
single informational contents, or present the same navigational structure in different 
ways. Therefore, the goal is not to create a single Web Information System, but a set 
of views of the same system, each of them being aligned to the individual needs and 
interests of a WIS user. 



2.2 Process Models 



We assert that building a WIS is a six-process step, in which a product is created at 
every step, based on the previous one, and the last step is the actual system 
implementation. This sub-section gives a brief description of the different processes, 
how they relate to each other and how they contribute to WIS design. Figure 2 
summarizes the process steps, the precedence among them and their major products 
(represented by dark boxes). 



L '' 




c WIS 



n N 



Fig. 2. Overview of the process steps 



2.2.1 Modeling the Potential User Goals 

This step consists in eliciting, classifying and describing the high level goals that the 
users may achieve when using the WIS. For instance, when he uses the library WIS, 
the user may decide to ask the library content or reserve the book copy. 
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2.2.2 Modeling the Targeted Users 

In this step, we identify, classify and build a model of the targeted users of the WIS. 
This process is mainly based on the concept of User Role and User Profile which will 
be described in more details in the next section. 

2.2.3 Modeling the WIS Information Domain 

In this step, the information domain schema of the WIS is created. It is composed of 
two parts : the unstructured information part, and the structured information part. The 
former refers to a textual, informal description of a concept or idea, like the textual 
description of the history of the library. The latter is based on well-known object 
oriented modeling paradigms as OMT [4], UML [5]. 

The main concern during this step is to capture the WIS domain semantics, as 
neutrally as possible, with very little attention for the users, their tasks or their goals. 
The relevant objects and relationships of the WIS domain are represented. These 
objects and relationships form the basis of the WIS, since many of them will finally 
show up as navigation nodes and links. The product of this step is the Web Based 
Information Conceptual Schema. 

In many situations, the structured information part is currently tailored or enriched 
from an already available Conceptual Schema of the Domain, for example if the 
target application is a Web interface to an existing database application. 

2.2.4 The Web Navigation Conceptual Modeling 

During this step, the navigational conceptual schema is created as a set of 
navigational views over the information domain, that take into account the profiles of 
the intended users. 

A navigational view is called Way of Navigating (WoN) and represents the best 
navigation method for users having certain profiles to achieve a goal. The set of 
WoNs corresponding to the same goal is represented in a structure that we call 
Navigational Semantic Unit (NSU). The NSU represents the possible ways that have 
the intended users with a certain role, to achieve a high level goal. Those concepts 
will be described in more details in section 4. 

2.2.5 The Web Hypertext Logical Modeling 

Once we have defined the navigation conceptual schema and stated how concepts can 
be navigated in the target WIS, it must be made perceptible to the user. This means 
defining what the navigational elements will look like. The Web hypertext logical 
modeling step is concerned with these aspects. The designer applies a set of 
conversion rules along with presentation rules to transform into a logical element each 
navigational element in the navigational conceptual schema. The output of this step is 
the hypertext presentation schema. 

2.2.6 The Implementation of The WA 

In this step, the designer builds the actual WIS, by mapping the logical Web hypertext 
elements into concrete interface objects available in the chosen implementation 
environment. We define a set of rules for mapping the logical model to the platform 
of choice. 
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3 User Goals Modeling and Users Modeling 

The key point of our approach is to consider a navigation session within a WIS as the 
achievement of a set of goals by the user. Therefore, the WIS should anticipate the 
user’s goals. 

In the following, we describe respectively the user goal modeling step and the user 
modeling step. 



3.1 User Goal Modeling Process Step 



We define a User Goal as an objective that need to be achieved by the targeted users 
when using the RWIS. 




Fig. 3. User Goal Model 



As shown in Figure 3, a User Goal is formally expressed as a clause with a main 
verb and several semantic parameters, where each parameter plays a different role 
with respect to the verb. There are four types of parameters (shown in the gray boxes), 
some of which have sub-types. For example in the goal : “Access to (a book copy)o,,.^, 
(by a choice list) (for the novice subscriber)g^_j^jj^j^^’’ , the target “a book copy” is 
an object because it exists even before “Access” is achieved ; the manner “by choice 
list” defines the way in which the goal has to be achieved, and the "novice subscriber" 
is the beneficiary. For a large description of the different parameters, please refer 
to [7]. 
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We distinguish two main types of goals : the information oriented goals and the 
application oriented goals. The former allows the user to access to information and to 
navigate through that information, the latter mainly allows the user to perform a 
complex task. For example, "access to a description of a book copy" is an information 
oriented goal, whereas "the reservation of a book copy" is an application oriented 
goal. 



3.2 Users Modeling Process Step 

In Parallel or after the goal analysis step, the targeted users of the WIS are identified 
and modeled. This modeling permits to gather relevant information about the users for 
the future adaptability. It is supported by the Users Model which is presented in 
Figure 4. 




Fig. 4. Users Model 



As it is shown in Figure 4, the users model provides three main concepts : User 
Role, User Profile, and Users Category. 

The concept of User Role allows to define the goals which may be of interest for a 
certain category of users, among all those that where described in the previous step 
(see in Figure 4 the relationship between "users category" and "user goal"). For 
example we can define for the Library system, the User Role "Subscriber" associated 
with the goals {"consult book copy", "reserve book copy", ...) and the User Role 
"Visitor" with the goals { "consult book copy", "subscribe to the library", . . . } . 

Therefore, the User Role concept expresses the Web users' positioning compared to 
the application domain as well as their rights within the WIS. For example, a 
Subscriber can reserve a book copy but the Visitor is not allowed to do it. 
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The concept of User Profile aims at capturing a series of relevant information 
which characterize a user, such as his experience of the application domain, his 
background, his preferences, . . . This concept is used to set up the best way a user will 
reach a goal he is allowed to reach. 

As shown in Figure 4, the structure of a user profile corresponds to an aggregation 
of profile parameters. 

A profile parameter describes a particular characteristic of the user such as his age 
or his size. It can be an elementary parameter or a composed parameter, the second 
type represents an aggregate of profiles parameters (elementary or composed). 

An elementary parameter is described by the pair <Name - Linguistic value>, 
where the concept Name indicates the parameter, and the linguistic value expresses 
the value of the parameter in terms of words of natural language as, for example, 
"Internet Knowledge (very poor)". 

In order to quantify the profile parameters values during run time, the method uses 
fuzzy set theory and fuzzy logic [8]. So, when he defines a profile parameter, the Web 
Application Engineer associates a fuzzy set to the linguistic value of the parameter. 
We call this process "User values fuzzyfication (or fuzzy quantification)". Figure 5 
shows an example with the Age and the Internet browser usability parameters. Thus, 
with respect to fuzzy set theory, the proposed method provides a user modeling 
approach which is true to reality as its allows a WIS to classify the users in fuzzy 
categories, not in a binary manner. 




Fig. 5. Illustration of profile parameters values quantification 

Finally, we assume that every user plays a role and has a profile ; this association 
defines what we call Users Category (see Figure 4). We have to stress that the User 
Role concept and the User Profile concept are orthogonal. For example we can have a 
Subscriber who uses the Web perfectly and another one who has never used it. 



4 The Navigation Conceptual Modeling Process Step 

The aim of the navigation conceptual modeling step is to operationalize the user goals 
in terms of navigational elements. 
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Figure 6 shows an overview of the main concepts of the Navigation Conceptual 
Model together with the relationship with the concepts presented in the last section 
(see gray boxes). 

The central concept provided by the Navigational Conceptual Model is the one of 
Navigational Semantic Unit (NSU), because a Navigational Schema is defined as a set 
of well organized NSUs. 




Fig. 6. The main concepts of Navigation Conceptual Model 

As shown in Figure 6, a NSU is a set of defined navigation methods (that we call 
Way of Navigating) which represent different alternatives to operationalize a same 
goal. Every Way of Navigating aims at guiding the users of a given profile to move 
easily in an adapted navigation sub-space, in order to achieve their goals. 

For example, in the Library domain, the operationalization of the goal To consult 
the Library content can be expressed by a NSU composed of two Ways of Navigating. 
The first one, adapted to a novice user proposes assistance, when the second one 
which is dedicated to an expert user offers a free use. 

The structure of a Way of Navigating (see Figure 6) is defined as a couple 
composed of : 

- a Navigation Structure, which features the structure of the navigation sub-space 

- a Navigation Guide, which represents some navigational guidelines associated to 
the Navigation Structure. 

In the following, we will respectively describe the concept of Navigation Structure 
and the concept of Navigation Guide. 





114 Christophe Gnaho 



4.1 The Concept of Navigation Structure 

As shown in figure 6, a Navigation Structure defines a navigation view over the 
application domain. It is represented hy a network of Navigational Nodes (NN) and of 
Navigational Link (NL). The NNs contain the information, the NLs show semantic 
and structural relationships between the information contained in the node. Figure 7 
shows as an example, the graphical representation of the Navigational Structure 
operationalizing the user goal "(Consult)y^i^ (a hook copy description)Q,^,^^j" This 
structure proposes to the user to get a book copy description, either by a form or by a 
choice list. 




Fig. 7. An example of Navigational Stmcture 



4.2 The Concept of Navigation Guide 

The concept of Navigation Guide is related to the Navigation Structure one. As 
shown in Figure 8, it is defined as a set of navigational guidelines which provide 
guidance to the final user to better exploring the navigation structure, in order to 
prevent him from getting lost. 

The approach we use to represent the navigation guidelines is based on the process 
meta-model proposed in [9]. A navigational guideline is characterized by a signature 
and a body. The signature allows to define the applicability conditions of the 
navigational guideline. As depicted in Figure 8, it is defined as a pair <navigation 
situation, navigation intention>. The navigation situation refers to the situation in 
which the navigation guideline can help the user satisfy his navigation intention (see 
the example in Figure 8). The body (see the example in Figure 8) of the navigation 
guideline describes the guidelines, it suggests how to progress at a given point of the 
navigation structure. It might be looked upon as a structured module of knowledge for 
supporting navigation decision making in the navigation space. 
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Fig. 8. The concept of Navigation Guide 



4.3 The Process of NSU Definition 

Due to the structure of a NSU, the process of its definition is incremental and 
recursive. It consists in two main steps dealing with the reduction of the associated 
user goal into operational goals, and the construction of the WoNs derived from the 
operational goals. 

4.3.1 Stepl: User Goal Reduction 

The first step a Web Application Engineer may follow when he/she plans to define a 
NSU associated to an intended user goal, is to reduce this goal to operational goals 
(goals with enough details for the user needs to be more easily understood [10]). This 
reduction process is based on the defined User Profiles, and leads to a number of 
AND/OR reduction graphs (one per profile) 

Let's exemplify this step with the goal Consult the Library Content, and suppose 
that we need to deal with the two profiles Novice User and Experienced User. 
Figure 9 depicts the goal reduction graphs obtained. 

4.3.2 Step 2: Deriving Navigation Structures and Navigational Guidelines 

The second step consists in respectively deriving the Navigation Structures and the 
Navigation Guidelines from the goal reduction graphs. 
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Fig. 9. Example of Goal reduction graphs 




Fig. 10. Example of Navigation Structure derivation 

Figure 10 presents as an example, the Navigation Structures deriving from the two 
reduction graphs shown in Figure 9. The first Navigation Structure suggests that the 
Novice Users may navigate within five Navigational Nodes : Proposed_Categories, 
Navigator_Help, Keywords _Search, Search_Result, and Book_Description. The users 
enter the structure by the node Proposed_Categories from which they can navigate, 
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through sub-categories or by using keywords search. The second Navigation Structure 
provides two alternative ways to experienced Users : Consult the library content by 
keywords or by categories. So the two structures provides different ways of 
performing the same goal. 



5 Related Works 

A certain number of researchers have already recognized the lack of methods and 
tools to Web sites engineering and have proposed a set of methodological approaches. 

Some methods as HDM[11], OOHDM[12], were originally designed for Hypertext 
or Hypermedia applications, and have been adapted to deal with Web specific issues. 
Let's see for example HDM-lite [13], a conceptual model for Web Application which 
is an evolution of HDM. Some methods such as W3DT[14] are very implementation 
oriented. Some as RMM[15], Araneus[16] have their origin in database design 
methods. 

All the above methods are heavily data driven and may be able to solve some 
maintenance problems similar to those which appear in databases such as redundancy, 
inconstancy, completeness. However, they pay less attention to the user adaptability 
issues. 

The WSDM [17], and Torii [18] are closer to our vision. With respect to WSDM 
and Torii, the proposed method adds to Web modeling the dimension of user 
modeling. The main specificity of our method is that it is goals driven. 



6 Conclusion, Implementation Issues and Future Work 

In this paper, we have presented a method for engineering WIS that adapt their 
content, their navigational structure and their interface to the needs of the users. 
Therefore, the method is user centered and user goals driven. 

Like traditional software approaches, the method considers the WIS creation in 
three different levels of abstraction. We also recognized that separating the 
management of the users information, the design of the informational content, 
navigation structure and presentation, facilitates many WIS management tasks and 
leads to a higher degree of independence and flexibility. 

Today, there are several approaches available to implement the ideas discussed in 
this paper, the most popular are : 

• Common Gateway Interface (CGI), a technical specification which makes possible 
a more complex interaction between Web clients and Servers. 

• Cookies which are small amounts of structured data that are shared between a Web 
server and a user’s browser. Cookies give the server, information about a user’s 
identity, preferences or past behavior. 

At the moment, however, we rely on the Active Page Server (ASP) technology 
developed by Microsoft. ASP files are server-side statements in Visual Basic syntax. 

Ongoing research aims at defining a CAWE (Computer Aided Web Engineering) 
environment to support the development process from conceptual modeling to the 
deployment of the WIS on the Web. 
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Abstract. Developing e-business solutions is a complicated task. It 
involves many different disciplines and requires knowledge of e- 
business technologies as well as business processes. In this paper we 
look into what causes this complexity, and discuss an approach to 
overcome the current barriers in e-business engineering. 

The approach combines knowledge of processes and technology with a 
new e-business engineering methodology, called Rapid Services 
Development (RSD). The ingredients of RSD are discussed in detail, 
and linked to current engineering approaches. 



1 The Complexity of e-Business Engineering 

Smooth, quick, and flexible development of e-business services and systems is an 
extremely and inherently complicated area. Current practice has not succeeded in 
adequately tackling this complexity. Despite an overwhelming amount of literature on 
the subject, the awareness of its business potential is limited. Business innovation in 
this area is still largely driven by the availability of technology and standards, instead 
of business need governing technology deployment [15]. It is well known that 
introduction of technology without change of business barely leads to productivity 
improvement [4]. The technology introduced should be determined by business needs, 
though new technology can inspire business opportunities. The business processes 
and chain or network co-operation should be leading in the development of electronic 
transaction services. 

Besides this, business transactions are diverse — they differ from company to 
company — and they are ever changing. Current ICT technology and its 
standardisation do not support the required transaction diversity and transaction 
flexibility. This has hampered the emergence of a true electronic transaction 
infrastructure, has kept electronic transaction investments costly and risky, and has 
kept many businesses from exploiting the potential of electronic transactions. When 
in fact implemented, electronic transactions have proven to be a restraining force in 
many cases, dictating business operations instead of following business needs. 

Also, e-business engineering has many integrating aspects. It integrates businesses, 
it integrates internal business processes with electronic transactions, it integrates 
paradigms, it integrates new transactions with existing systems in place, and it 
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integrates an enormous set of technologies, ranging from data communication 
technology to EDI, from database technology to SET, from Internet to OEX. Such 
complexity cannot be managed without an explicitly structured approach, based on 
architectural reasoning. For basic e-commerce applications such architectures are 
being developed [13]. New approaches to system development, such as component- 
based development [11] and frameworks are promising, but barely support cross- 
company co-operation (e.g. the San Francisco framework, [9]). 

Nambisan and Wang [14] argue that web technology can be adopted at three 
different levels: information access, work collaboration, and core business 
transactions. Reaching the highest level of adoption is the most profitable, yet the 
most difficult. Three different knowledge barriers hinder adoption: 

• technology related knowledge is missing, as web technology is barely mature as 
well as changing fast; 

• process and methodological related knowledge is insufficient, as e-business 
requires different approaches than classical systems engineering and requires 
specific combinations of resources; 

• application related knowledge is hard to obtain, as e-business applications cross 
organisational and intra-organisational boundaries. 

Summarising, in order to a get grip on the development of new electronic 
transactions and transaction services for networked enterprises one needs methods 
and tools that: 

• start the development of transaction services supporting cross-company co- 
operation from a business perspective', 

• allow to assess the consequences and prerequisites of technology for business 
processes and cross-company co-operation; 

• link co-operation between companies to internal business processes and existing 
(legacy) systems; 

• effectively allow knowledge on standards and available components to be 
gathered and re-used, preferably supporting component-based development and 
re-use; 

• are strongly based on an architecture for electronic commerce and business 
functions to tackle the complexity of the field. 

Many partial answers to these requirements are available. This paper describes the 
approach that is being developed within the Giga Transaction Services project at the 
Telematics Institute. It aims at solving all three information barriers identified by 
Nambisan and Wang by providing three building blocks for e-business engineering: 

• Knowledge on the current technological state of affairs in e-business 
components and standards; 

• Knowledge on business processes and building blocks for e-business; 

• A framework for systematic development of e-business transaction systems, 
using this knowledge. 

These three building blocks together will provide an integral approach to e- 
business engineering. The emphasis here is on the engineering framework, called 
Rapid Service Development. The other elements are discussed briefly in Section 3. 
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2 Rapid Service Development 

Usually the design of e-business services and systems involves a multidisciplinary 
team that interviews and discusses with users, experts, and business consultants. 
During the design process a method emerges, rather than that it was specified in 
advance. Once the job is done, it seems to many that the followed “method” was 
erratic and inefficient; but this is only in retrospect since during the job there was 
hardly an alternative. A field where systems design has often been compared with, 
namely classical architecture and construction work, seems to have a similar lack of 
structured methods. 

We follow Checkland [3], in the sense that what we want to achieve is not a 
straightforward development method, but a development methodology (also see 
Hirschheim, Klein [8]). 

A framework that enables a structured process (debate/dialog), supported by tools, 
which leads to a method for the development of transaction services on the basis of 
existing components. 

The RSD methodology defines an integrated framework for the specification and 
development of e-business services, with a particular focus on business-to-business 
transactions. Generally, the development of such services is highly complex, as it 
involves many different aspects ranging from high-level strategic business concerns 
to low-level protocol definitions. In order to deal with this complexity, the ‘separation 
of concerns’ principle is applied. The RSD framework distinguishes seven different 
aspect areas, called cornerstones, from which models and specifications can be made. 
In this way, one can focus on one set of concerns at a time, resulting in a lower 
(perceived) complexity. At the same time, however, the methodology provides well- 
defined links between the cornerstones, thus rendering an integrated framework for 
business-driven design of transaction services. 

The RSD framework has been inspired by two sources. First, there is the context of 
information systems engineering with a vast amount of literature on methodologies. 
Second, experiences with an initial RSD method applied to several cases have 
provided much feedback. 



2.1 Cornerstones in RSD 

T he seven cornerstones of the RSD are structured along two dimensions, as illustrated 
in pig 1| Firstly, we distinguish between business-oriented models (on the left) and 
technology or system oriented models (on the right). Secondly, models can vary in 
scope or granularity. Along this axis, they range from high-level, broad scope, coarse 
grained, little detailed models (at the top) to low-level, narrow scope, small grained, 
highly detailed models (at the bottom). In addition, there is an implicit third 
dimension, the development dimension, ranging from analysis, through design, to 
realisation, that is not visualised here. The third dimension is orthogonal to the other 
two: development can take place in any of the cornerstones. The ultimate goal is not 
only a technological solution (system realisation), but one that is integrated with the 
realisation of the business models as well. 
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Fig. 1 . The RSD framework 



On the business-oriented side, we model and design the way organisations co- 
operate in a networked enterprise. To this end, we use 

• networked enterprise models, which give a high-level view of the co- 
operation in terms of the actors involved, the roles they fulfil and their 
relationships, but also in terms of the business functions they perform and the 
flows between these, 

• transaction scenarios, which describe the inter-organisational processes and 
transactions that achieve the co-operation, and 

• specifications of the procedures, which describe in more detail the interactions 
and message formats for conducting transactions between organisations. 

On the system-oriented side, we model and design the technology that supports the 
co-operation between organisations in a networked enterprise. To this end, we use 

• system descriptions, which describe the architecture and the composition of 
transaction systems, 

• component specifications, which describe the functionality of well- 
encapsulated, reusable pieces of software, and 

• protocol and code specifications, which describe in detail the protocols used 
for communication between components, as well as the inner workings of these 
components where necessary. 

Ambition and scope can be formulated both in terms of business goals and in terms 
of technology. The crossover from business oriented modelling and design to system 
oriented modelling and design is somewhere between procedures and protocols. Here 
the division between the two ends of the spectrum gets blurred. At th e higher levels of 
abstraction, there is a more pronounced gap (also visualised in pig. Ij between 
business modelling and technology oriented modelling. 
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2.2 Methodologies for e-Business Engineering in RSD 



The cornerstones as identified above do not provide sufficient guidance in developing 
e-business systems. They form the map that identifies all possible locations, yet they 
do not provide the shortest, best, or even any, route to come from a given starting 
point to operational systems. 

Several well-known methods can be used in this framework. The classic waterfall 
model could be employed: starting with Ambition & Scope and the business-oriented 
models for requirements analysis and definition, and then throug h the te chnology 
oriented models for implementation, integration and deployment (see Fi^^. 




Ambition & Scope 



Fig. 2. A waterfall / top-down approach in RSD 



However, especially in a multidisciplinary field such as e-business engineering, a 
waterfall approach is known not to function. A more iterative, as advocated by, for 
example, Booch [2], would emphasise the lower level cornerstones of RSD, roughly 
cycling from business/model oriented parts to system-oriented building blocks. 
Below, we sketch two approaches that we have used so far and that have proven to be 
of some value. The first approach emphasises top-down design, the second approach 
is aimed at integrating legacy systems. 

2.2.1 A Generic Approach 

The typical approach in any case situation would follow, roughly, the following steps: 

1. Identify the goals, side-conditions and scope of the design (scope and ambition); 

2. Identify the actors involved: what parties should be taken into account in the 
new e-business application? 

3. Identify the roles that are present: what are the different responsibilities in the 
process, and what actor can perform what role? 

4. Determine how goods, information and money flow through different roles. 

These first four activities define the ambition and scope and sketch the networked 
enterprise model of the e-business services. The next step makes the transition to 
transaction scenarios. 

5. Determine the activities underlying the flows, and assign the activities to 
different roles. 
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When activities are related to different roles and actors, they are called 
transactions', interactions between different actors. From these we choo se those that 
will be developed and implemented further. The approach is visualised in [Fig. 3j 




Fig. 3. A generic approach in RSD 

6. Determine what restrictions current processes, systems and ICT architectures 
and applications impose. 

7. Determine the technologies most suited for different transactions, depending on 
the level of openness, security restrictions, implementation frameworks and so 
on. 

8. Refine the transactions to such a level that they can be implemented and 
combined with existing systems and processes. 

We now arrived at the most detailed level of e-business specification that serves as 
a starting point for system development, and we have reassured that the specifications 
given are in line with systems, components and procedures currently available. 

9. Build the e-business system components, and integrate them into a running 
system. 

This approach has been applied in a number of case studies. Nevertheless, it still 
requires considerable validation and improvements, and has to be adapted to the needs 
of every specific situation. 

2.2.2 Legacy Integration 

It is rare that an e-commerce operation is started from scratch. When making the 
transition to conducting business electronically, most organisations will need to take 
existing systems and processes into account. Such so-called legacy systems usually 
represent a considerable investment, which makes it economically unfeasibleand risky 
to replace or redesign them, despite their likely inflexibility and incompatibility with 
current standards. Therefore, the reuse and successful integration of legacy systems in 
the new business infrastructure is of vital importance to businesses engaging in e- 
commerce. The problem is aggravated by the fact that developments in e-commerce 
are very fast, making the state-of-the-art solutions of today, the legacy systems of 
tomorrow. 
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In this section, we discuss the role of legacy integration in the RSD methodology. 
RSD uses a methodology for legacy system integration, called BALES [7], The main 
objective of BALES is to parameterise business objects with legacy objects. Legacy 
objects serve as conceptual repositories of extracted (wrapped) legacy data and 
functionality. The BALES methodology essentially comprises three phases: forward 
engineering, reverse engineering and meta-model linking. The forward engineering 
phase transforms a conceptual enterprise model into a predefined meta-model, which 
serves as a basis for comparison between business and legacy object models. In the 
reverse engineering phase legacy objects and processes are represented also in the 
same meta-model. In the link phase the descriptions of both the forward- and 
backward-engineered models are connected in order to be able to ascertain which 
parts of the legacy object interfaces can be re-used within new applications. During 
this phase, queries are used to infer potential legacy components that may be linked to 
bu siness c omponents. 

EEL ^illustrates the processes for legacy integration in RSD. Most likely, the 
legacy systems that are to be reused have been identified in the RSD ambition and 
scope cornerstone. Based on these requirements, the relevant parts of the business- 
oriented models are selected and transformed into the enterprise meta-model of 
BALES. The reverse engineering techniques can be used to extract reusable legacy 
objects from the code for the legacy systems. The code for these systems can be found 
in the protocols cornerstone. The legacy objects that result from the reverse 
engineering phase may be positioned within the components cornerstone, as these can 
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Fig. 4. Legacy integration in RSD 

be seen as reusable components. The link phase should be positioned between the 
business-oriented cornerstones and the components cornerstone. The linking 
techniques should be used to link reusable legacy objects to the business processes 
and business objects in the enterprise model derived earlier. Linking eventually 
results in a new integrated system consisting of business objects that may contain 
legacy objects. 
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2.3 Related Engineering Approaches to Systems Design 

Many systems engineering methods have been developed the last decades. These 
approach systems from three different perspectives: 

• Information engineering; 

• Business process engineering; 

• Inter-organisational systems engineering. 

In the first category, Zachman has been very influential. In searching for an 
objective, independent basis upon which to develop a framework for information 
systems architecture Zachman [16] was inspired by the field of classical architecture 
itself. In “building” architecture there is a whole set of architectural representations 
prepared during the process of building. Zachman reasons that an analogous set of 
architectural representations is likely to be produced in building any complex product. 
The framework that Zachman finds by analogy consists of six representations and 
three different types of descriptions. The six representations are scope, model of the 
business, model of the information system, technology model, detailed description, 
actual system. The three descriptions are based on the questions what, how and where 
and represent the data, process and network description respectively. In this way 
Zachman has divided the “architectural space” in a two-dimensional matrix having a 
representation axis and a description axis. 

Although this is an excellent theoretical framework to organize and control the 
requirements it does not provide a method for designing information systems. How to 
use the framework, in what order for example, in order to arrive at a design is not 
specified. Further, it may be questionable whether such a framework, although useful 
for analysis, is actually practical for the design of information systems. For example 
the large number of different cells, each with a single topic, view, and model forms 
too much emphasis on the separation of the system in parts/aspects rather than the 
integration of the different aspects/parts into a single system. 

More towards an engineering method. Information Engineering has been a widely 
accepted method. Information engineering is a methodology to develop information 
systems that came up in the late 70’s and early 80’s. The ideas were primarily 
conceived by Clive Finkelstein and James Martin. Meanwhile, a number of different 
variants of the approach have been developed, including Rapid Application 
Development (RAD) [1]. 

IE is a comprehensive methodology, covering all phases of the life cycle. The 
underlying philosophy is that data are at the heart of an application and that the data 
types are considerably more stable than processes. Moreover, IE takes the viewpoint 
that is the most appropriate way to communicate is through diagrams. 

The primary IE model has three elements: data, activity and the interaction of data 
and activities. All three of them can be at different levels of detail. It is a top-down 
methodology, and begins with a top-management overview of the enterprise as a 
whole. In the steps of the methodology one adds detail and zooms in to the relevant 
areas of the systems. They identify three levels of detail: 

• Information strategy planning: has the objective to construct an information 
architecture. This is done at the enterprise level and includes identification of 
relevant business areas; 
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• Business area analysis: the objective is to understand the business areas and to 
determine their system requirements; 

• Systems planning and design: establishes the behaviour of the systems in a way 
the user want and is achievable using technology; 

• Construction and cutover: builds and implements the systems as required by the 
previous levels. 

In the field of business engineering and re-engineering a lot of work has been 
going on as well. Within the Telematics Institute we have developed, on the basis of 
many other methods, a method plus supporting tools in the Testbed project. The 
Testbed project develops a systematic approach to handle change of business 
processes, particularly aimed at processes in the financial service sector [5]. A main 
objective is to give insight into the structure of business processes and the relations 
between them. This insight can be obtained by making business process models that 
clearly and precisely represent the essence of the business organisation. These models 
should encompass different levels of organisational detail, thus allowing to find 
bottlenecks and to assess the consequences of proposed changes for the customers and 
the organisation itself. Formal methods allow for detailed analysis of models and tool 
support in this process. 
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Fig. 5. A model-based approach to business process change 



As shown in pig. 5| business processes models can be used for analysis and 
manipulation of business processes without having to actually build these processes 
first. This model-based approach allows to identify the effects of changes before 
implementing them. The models constitute an important means for the preparation 
and actual implementation of organisation and IT change. 

The Testbed method guides the engineer through the analysis, modelling and re- 
engineering phases of the project. Thus high-level management objectives are 
translated into operational steps and concrete modelling, analysis and redesign 
objectives. One of the most important contributions of Testbed is that is has brought a 
systematic approach to business modelling and analysis. The method is one of the key 
aspects in this, combined with a language that is very well suited for its purpose. 
Testbed, however, is not aimed at system development for business processes, but 
more to the analysis of processes. Not concrete approach for system design as such is 
given. 
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As for the inter-organisational design, the case is much less clear there. There has 
been some work on characterizing the problems in this area [12], but we are not aware 
of systems engineering approaches in this area, despite the overwhelming amount of 
management literature in this. 

The Reference Model for Open Distributed Processing (RM-ODP) (ISO 1995 [10]) 
is an international standard jointly developed by the International Standardisation 
Organisation (ISO) and the International Telecommunication Union (ITU). The 
objective is to provide a unifying architectural framework for the standardisation and 
construction of distributed systems. 

The RM-ODP identifies five viewpoints from which to specify ODP systems, each 
focusing on a particular area of concern. It is claimed that the ODP viewpoints form a 
necessary and sufficient set to meet the needs of ODP standards. Viewpoints can be 
applied, at an appropriate level of abstraction, to the specification of a complete ODP 
system or to the specification of individual components of an ODP system. 

The RM-ODP is a reference model for distributed processing; it is not a method for 
system development. Its viewpoints provide a framework for modelling distributed 
systems, but there is no prescribed way of moving through the viewpoints. However, 
the separation of concerns provided by the viewpoints will be of importance in any 
ODP-based development process. As yet, the RSD does not have a prescribed or 
preferred development process either. Currently, it identifies a number of 
cornerstones that may be passed through in an unspecified order during the 
development process. 



3 Component Based e-Business Engineering 

A methodology in itself is useful, yet misses essential ingredients: domain knowledge. 
As argued in the introduction, process knowledge is only one out of three barriers to 
overcome in effective e-business engineering. Two others barriers are the 
technological knowledge barrier and the application knowledge barrier. 

To overcome these two barriers, we develop two different means: an e-business 
transaction / networked enterprise library, and a technology library. 



3.1 Process Components 

Application knowledge is crucial in e-business. Without a good overview of possible 
business models, their consequences for roles, processes, transactions and ICT, it is 
very difficult to keep a grip on e-business engineering. Such a good overview is 
largely a matter of experience in the e-business field. Although experience is crucial, 
it is not the case that transfer of experience cannot be supported more effectively than 
is currently the case. A means of doing so is building an e-business “process library,” 
related to the different cornerstones in RSD. It provides generically applicable 
components an processes that can easily be specialised to specific applications. 

As an example take an auctioning process. There are many types of auctions. What 
all auctions have in common, though, is that: 

• there are three roles involved: seller, buyer, and auction; 

• auctions receive offers from sellers and bids from buyers; 
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• auctions carry out three functions: opening an auctioning event, carrying out the 
auctioning event, and closing an auctioning event; 

• auctions involve the use of a catalogue, containing all offers at hand, and a bid 
status, containing the current state of bidding. 

Below we introduce a networked enterprise model for electronic auctions. |Fig. 6 
shows the roles involved in an auction and the business functions they fulfil. |Fig. 7 
shows the function and flows. 
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Fig. 6. Role diagram for elecUonic auctions 
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Fig. 7. Function diagram for electronic auctions 

In this way, a catalogue can be built for e-business components, containing 
components at different levels of detail, ranging from networked enterprise models to 
protocol descriptions. These components will be links to e-business implementation 
components and standards. The components form the e-business process building 
blocks 
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3.2 Technologies for e-Business 



One of the most important problems for companies getting involved in e-business is 
the overwhelming amount of standards, products and technologies available. What 
will XML mean for me? What will be the most important means for Internet 
payments? What are good architectures for e-business appl ication s. The number of 
acronyms passing by is too much to handle for any business (Hg- 8)1 
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Fig. 8. Overview of some recent technologies for e-business (GigaTS 2000) 



Guidance in the field is very much needed to overcome the barrier of technological 
knowledge. Next to the methodology and process library, continuous monitoring of 
ICT developments is needed. 

Besides monitoring, explanation of the consequences of technological 
developments in terms of business objects is needed as well. Demonstrators, showing 
how technology can support business, do so. 



4 Combining Components, Technology and Methodology 

In this paper we argued that e-business engineering is a highly complicated task. With 
the rise of the Internet many companies feel that they have to move to e-business, yet 
they find themselves hindered by lack of information, knowledge and solid 
approaches. The field is relatively new, and therefore, relatively immature. 

To help companies in their way to effective introduction and exploitation of e- 
business, an integral approach to e-business engineering is needed. Such an approach 
should start from the business perspective and not from a purely technological 
perspective, yet should be based on state-of-the-art knowledge of standards and 
products. 



Rapid Service Development: An Integral Approach to e-Business Engineering 131 



We have shown the building blocks of such an integral approach. The combination 
of a process library, technology assessments and Rapid Service Development takes 
care of most of the difficulties organisations experience now. To the best of our 
knowledge, such an integral approach is still lacking. 

Of course, quite some steps still have to be taken to arrive at a truly effective 
approach. First of all, not all cornerstones in the method are well defined yet. 
Especially the step from business oriented models to implementation oriented models 
is a difficult one. It usually involves a paradigm switch, moving from business 
modelling techniques to object oriented techniques like UML. As our proposed 
methodology switches back and forth between the two sides of the framework, the 
link has to be worked out in detail. 

Also, the added value of the process library highly depends on the quality and 
architecture of its contents. At this moment, very few e-business processes are 
documented properly. To obtain the right information in different application areas is 
a cumbersome task. By means of a series of cases in different fields we thrive to 
obtain this knowledge. 

Finally, no method can be applied efficiently without effective tool support. Both 
at the modelling level as well as the system design level tool support is still too 
limited. 

4.1 The Giga Transaction Services Project 

The approach discussed in this paper is being developed in a project called Giga 
Transaction Services. GigaTS supports organisations in the development of 
innovative transaction services. It does so with state of the art knowledge, methods 
and software tools that allow for effective development of new services. GigaTS takes 
the business perspective as a starting point, looking at networks of organisations and 
the way e-commerce technology can support them. Methods and tools are rooted in a 
combination of technological and business knowledge of currently available and 
future components and e-commerce applications. In this way re-use of components is 
promoted and time-to-market of services is reduced. Fast and effective design and 
introduction of e-commerce services is our central objective. The GigaTS project 
started in May 1999 and will take four years. More information on the project and all 
deliverable mentioned can be found at http://gigats.telin.nl. 

The approach to legacy integration is being developed together with people from 
the KUB in the Telematics Institute project “Process integration in E-commerce.” 
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Managing Information on the Web 



1 Overview 

One of the major issues in which Web Engineering differs from the traditional 
computing applications is the information or content management. The software 
developers until now have dealt with the data more in terms of storage and retrieval 
techniques and fine-tuning the performance of a given application. The development 
of Web-based systems forcefully brings to the front the questions of content 
management (information modelling, authorship, volume, rate of change, and 
maintenance). The textual and hypertextual characteristics of Web-based content 
make it imperative that all Web developers understand the nature of the information 
their applications will have to deal with. 

The first paper in this section. Layout, Content and Logic Separation in Web 
Engineering, brings out the complexity of (hyper)textual information and Web-based 
systems in terms of layout, content and logic. The rapid development of flexible, 
layout independent Web sites is an increasingly important problem. Flexibility, 
scalability and the ability to adapt to evolving layout requirements is a key success 
factor for many Web sites. A fundamental way to meet these requirements is to 
strictly separate business logic from the layout and the content. The World Wide Web 
Consortium's XML and XSL standards aim at providing the separation between 
layout and content only. This paper describes the ongoing work in separating the 
layout, the content and the logic of web sites and shows how this separation is 
supported by the tool MyXML. The underlying concepts of the solution are a 
declarative description of the layout information, automatic generation of static and 
dynamic pages and support of interconnection to extended information sources such 
as databases. 

The second paper. Restraining Content Explosion vs Constraining Content Growth, 
tackles the question of growing volumes of information As users learn new ways to 
explore the potential of Web sites and applications, the Web sites are faced with the 
problem of information content that grows rapidly, generating problems of content 
management and maintenance. It becomes essential, therefore, to distinguish 
explicitly between the responsibilities for content generation (end user involvement) 
and those for managing the (decentralised) information systems, a province of Web 
Engineering. 

The third paper, A Classification of Web Adaptivity: Tailoring Content and 
Navigational Systems of Advanced Web Applications examines the problem of 
information management by considering the consumer behaviour. The Web is 
transforming the role of users/customers, eroding their brand loyalties. Adaptive Web 
applications are proposed as an alternative to retain the customers. However, the 
intended type of adaptivity is not yet clearly understood and needs to be explored in 
detail. The paper focuses on three types of adaptivity: content, primary navigation, 
i.e. direct links within the documents, and supplementary navigation through index 
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pages, trails, guided tours and so on. Sueh adaptivity impaets on the design of, and 
total effort involved in building Web-based applieations 

The last paper, The developers' view & a practitioner's approach to Web 
Engineering, emphasises the issues of content and application reuse, ease of 
maintenance and interoperability. The paper argues for a formal framework to tackle 
these issues and reports a case study on the adoption of snch a framework and 
snbsequent development of a support environment nsing XML and RDF. 
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Abstract. The rapid development of flexible, layout independent web 
sites is an increasingly important problem. Flexibility, scalability and 
the ability to adapt to evolving layout requirements is a key success 
factor for many web sites. A fundamental way to meet these requirements 
is to strictly separate business logic from the layout and the content. 
The World Wide Web Consortium’s XML and XSL standards aim at 
providing the separation between layout and content only. In this paper, 
we describe our ongoing work in separating the layout, the content and 
the logic of web sites and show how this separation is supported by the 
tool MyXML. The underlying concepts of our solution are a declarative 
description of the layout information, automatic generation of static and 
dynamic pages and support of interconnection to extended information 
sources such as databases. 



1 Introduction 

Today’s web sites provide highly dynamic, personalized and interactive services 
to users. The maintenance of these services becomes non trivial when the amount 
of information exceeds a certain threshold. Key requirements for successful web 
sites are support for change (e.g., change of layout requirements) and scalability. 
As the size of the information in a web site grows, it may become difficult to 
adapt to evolving requirements [13]. 

Many technologies and tools exist that support the web engineer in building 
web services quickly. Popular technologies such as Coldfusion [1] and PHP [10] 
all aim at cutting down development time by automating frequently needed 
functionality (e.g., database queries, displaying dynamic information in a certain 
layout, etc.). 

The tools used for web engineering can generally be classified as being server- 
side or client-side. Applets and JavaScript are typical examples of client-side 

* This project is supported in part by the European Commission in the Framework 
of the 1ST Programme, Key Action II, on New Methods of Work and eCommerce. 
Project number: 1ST- 1999- 11400 (MOTION) and in part by the Austrian Academy 
of Science. 
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technologies. Java servlets and PHP are well-known examples of server-side tech- 
nologies. Template-based technologies, such as Coldfusion or JSP, are usually 
server-side tools and use templates and scripting languages to define output and 
program logic. We have found template-based solutions to be especially useful 
in achieving scalability and layout flexibility. But complete layout flexibility can 
only be achieved if there is a strict separation between the content and the lay- 
out. The content and layout information, however, in most web sites are tightly 
coupled as HTML files. To modify the layout, first the content information has 
to be identified. Then it has to be extracted. The last step is to integrate the 
extracted content information with the new layout. The whole process may be- 
come complicated and difficult, especially if the layout or the content is not well 
structured. 

The World Wide Web Consortium’s extensible Markup Language (XML) 
[15] along with the extensible Style Sheet (XSL) [16] technology aim at solving 
the layout and content separation problem. Ultimately, complete layout indepen- 
dence can be achieved by the use of XML and XSL. This is especially important 
when providing support for a variety of devices (e.g., cell phones, digital assis- 
tants, etc.). By a layout-content separation based on XML/XSL the layout can 
be modeled according to the display characteristics of a device. Layout indepen- 
dence will be a basic requirement for web sites of the future that will have to 
provide services to a large variety of different devices, components and interfaces. 

Although the layout and content separation problem has been attacked in- 
tensively (e.g., by standards such as XML, XSL, Cascading Style Sheets, etc.), 
the problem of separating the business logic from the layout and content in 
dynamic web services has not received much attention yet. Most popular web 
technologies (such as PHP, JavaScript, Active Server Pages (ASP) and Java 
Server Pages (JSP)) are XML unaware and do not exploit its capabilities. These 
tools and technologies lack support for the creation and maintenance of layout- 
independent dynamic web content. Some XML-based solutions for separating 
the logic from the content and layout have been proposed recently [5,7,6] but no 
standards yet exist. 

This paper describes our ongoing work to solve the layout, content and logic 
separation problem faced by highly dynamic and evolving web sites. The paper 
discusses our web engineering experiences with the Vienna International Festival 
(VIF) web site and presents My XML, an XML/XSL-based tool designed and 
implemented for the rapid creation of flexible, layout-independent dynamic web 
content. MyXML is downloadable from http://www.infosys.tuwien.ac.at/ 
my xml. We are currently evaluating MyXML in another web project involving 
large sets of dynamic and static information. 

1.1 Terminology 

A web-based service is concerned with the following three issues. 

Business Logic. We define the business logic of a web site as the functionality 
that is necessary for providing the dynamic interaction and services to the users. 
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Content. We define the content of a web site as the “real” information offered 
by it (e.g., price lists, titles, names, etc.). 

Layout. Layout denotes the formatting information applied to the content to 
display it. 



2 Separation of Layout, Content and Logic in Large Web 
Sites 

In this section we first motivate the problem by introducing our case study and 
then the template-based approach to web application design. 

2.1 Case Study: The VIF Web Site 

The Vienna International Festival (VIF ) offers a bilingual web service on culture 
and arts, multimedia data and several interactive services. Since 1995, our group 
has been involved in managing and building the VIF web site [11,12]. Popular 
web engineering tools did not offer good support for the complete life cycle of this 
dynamic web service. Until now, this site has been powered by web engineering 
tools we have developed to bridge this gap [2,14]. Our tools used templates for 
the generation of static HTML pages and concentrated on making the static 
content generation as flexible as possible. 

Most of the dynamic information of the Vienna International Festival is 
stored in a database. Static HTML pages are generated from the information in 
the database which is updated as new information becomes available. One im- 
portant requirement is that the layout of the web site changes every year. The 
layout depends on the theme of the festival in the specific year and considerable 
modifications are necessary. One drawback of our previous web engineering tools, 
which is also true for many other popular web technologies we have evaluated, 
has been the lack of support for layout independent dynamic web content. Our 
previous tools did not support a complete layout-content-logic (LCL) separation. 
We used macros and templates for achieving flexibility and met our requirements 
to a large extent. But still the content and layout was mixed in HTML-like files 
used by our tools. 

To achieve a complete layout, content and logic separation in partially or 
fully database backed web sites, we identified three important requirements: 

1. The ability to extract static information from a database and to structure it 
(this can be done using XML) . 

2. The ability to define hooks within the layout where dynamic content can be 
hung (i.e., the logic is separated from the layout). 

3. The ability to define templates with CGI and SQL support so that frequently 
needed database and business logic can be generated automatically. Inter- 
faces need to be generated for the integration of further user-defined logic. 
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The idea is to define the content, the layout and the logic separately {sepa- 
ration of concerns) . Such a separation allows people responsible for the layout 
or the logic to work independently of the people working on the content. The 
logic can be changed without having effects on the layout or the content. 

An example from the VIF web site shows the lack of a clear LCL separation. 
The visitors of the site can choose among all the performances and request details 
on an event by simply clicking it. In the background, a Perl script is executed 
on the server which queries the database and retrieves the details of the selected 
event. Finally, the information from the database is formatted and returned to 
the client. For reasons of brevity we have simplified the example and merely 
create a short description of the event in a simple layout. 

Figure 1 shows that the layout information, the content and the logic are 
combined in a single Perl file. A slight change of the layout would force a modifi- 
cation of the Perl script. In fact, not only this single script but all static HTML 
pages and all other scripts are likely to need an update. A problem like this can 
be avoided by an LCL separation. 

2.2 LCL Separation Using Template Engines 

Web template engines are tools that generate HTML from templates defined 
by the user. Several popular template-based web tools support template-based 
web engineering. Coldfusion [1] and Webmacro [18] are two such tools that have 
gained popularity among web application developers. 

The rationale of template-based web development is to create templates for 
the input and output of an application. Coldfusion uses HTML-like code and 
HTML tags for defining layout and functionality. The user can build SQL func- 
tionality into the templates and can define how the result set of the database 
query should be formatted and output on a browser. An application server is usu- 
ally deployed in conjunction with the web server. The application server parses 
the templates and HTML source code and returns the output to the client (or 
the web server if the engine is built into it). Template engines enable the web 
developer to create dynamic and interactive web functionality rapidly. Program- 
ming language constructs like if-then-else or loops are frequently supported to 
facilitate conditional functionality or iterations. Although template-based web 
engineering tools automate web application programming to a large extent, most 
existing tools do not completely address the LCL separation problem. 

3 The MyXML Template Engine 

My XML is an XML/XSL-based template engine that supports a strict separa- 
tion of layout, content and business logic. The content and its structure are de- 
fined in we 11- formed XML documents, the layout information is given as an XSL 
stylesheet and the business logic is defined separately in an arbitrary program- 
ming language. The template functionality of the MyXML engine is exploited by 
using special MyXML elements in the input XML document. These tags (i.e., 
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# ! /usr/local/bin/perl 
use CGI; 
use DBI; 

print "Content-Type: text/html\n\n" ; 

$query = new CGI ; 

$event_id = $query->param("id") ; 

$dbHandle = DBI->connect ( 

"dbi :mysql : WWW: www. f estwochen . at : 10000" , 
"some_user" , 

"some_passwd" , -Q) I I die 

"Cannot connect to database: $DBI : : errstr" ; 
$dbquery = "select distinct title, date, 
description from VIF_EVENTS 
WHERE id = ’ $event_id’ " ; 

$dbqh = $dbHandle->prepare($dbquery) ; 
if ( ! (dbqh kk $dbqh->execute() ) ) {. 
print "Error occured on $dbquery\n" ; 
exit ; 

} 

$current = $dbqh->f etchrow_hashref () ; 

print ("<HTML>\n") ; 
print (" <HEAD>\n"); 
print!" <TITLE>\n"); 

print!" VIE - $current->{ ’ title ’ }\n" ) ; 
print!" </TITLE>\n") ; 
print!" </HEAD>\n"); 
print!" <B0DY>\n"); 

print!" <H1> $current->{’ title’} </Hl>\n") ; 

print!" <TABLE B0RDER=l>\n") ; 

print!" <TR>\n"); 

print!" <TD>Date : </TD>\n") ; 

print!" <TD>$current->-{’date’}</TD>\n") ; 

print!" </TR>\n"); 

print!" <TR> \n " ) ; 

print!" <TD>Description:</TD>\n") ; 
print!" <TD>\n"); 

print!" $current->{’description’}\n") ; 

print!" </TD>\n"); 

print!" </TR>\n"); 

print!" </TABLE>\n") ; 

print!" </B0DY>\n"); 

print !"</HTML>\n") ; 

Fig. 1. Retrieval and formatting of event information in Perl 
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elements) are defined in the My XML template language that is based on the 
My XML namespace definition [3] . The layout stylesheets that can be applied to 
a My XML document are arbitrary XSL transformations. 

The functionality of the My XML template engine is based on the MyXML 
process which defines the actions to be taken depending on the type of the 
MyXML input document. Figure 2 shows the MyXML process. 
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Fig. 2. The MyXML process 



The process starts with a MyXML input document. Any well-formed XML 
document which may contain elements of the MyXML template language can be 
used as input document. In the next step a pre-processing XSL stylesheet can 
be applied to add layout information to the document. Additionally, the XSL 
stylesheet can be used to add static information to a document (e.g., header and 
footer) or to restructure the input document. After having passed this second 
step of the MyXML process the template engine processes the modified input 
file. 

The MyXML template engine distinguishes between two kinds of input doc- 
uments: static and dynamic documents. A MyXML document is considered 
static if all MyXML elements can be resolved during processing time. A dy- 
namic MyXML document, on the other hand, contains at least one dynamic 
MyXML element such as a reference to a CGI parameter or a dynamic database 
query which can only be evaluated at runtime. 

If the MyXML engine detects a static input document, it processes the 
MyXML elements that it contains. Optionally, it applies a post-processing XSL 
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stylesheet before creating the result file which usually, but not necessarily, is an 
HTML or an XML file. Such a post-processing stylesheet could be used to add 
additional layout information based on the result of a database query. 

If a dynamic input document is passed to the My XML template engine, some 
source code has to be generated which handles all dynamic aspects defined in 
the input document. As was mentioned above, arbitrary programming languages 
can be supported by the MyXML engine. A special code generator interface 
represents the link between the MyXML template engine and the business logic. 
By implementing this interface for a given programming language, support for 
that language can be added to the MyXML engine. The current implementation 
supports generation of Java code which can easily be used by servlets or other 
programs encapsulating the business logic of a web site. 

In this section, we redesign the simple example presented in Sect. 2.1 to work 
with the MyXML template engine. First we have to set up the MyXML input 
document which describes the content of the page to be generated (see Fig. 3). 



<?xml version="l . 0" encoding="US-ASCII"?> 

<!D0CTYPE event_search> 

<event_search xmlns : myxml= 

"http : //www. inf osys . tuwien. ac . at /ns /myxml " > 

<myxml : sql> 

<myxml : dbcommand> 

SELECT title, date, description FROM VIF_EVENTS 
WHERE id = ’<myxml : cgi>id</myxml : cgi> ’ ; 

</myxml : dbcommand> 

<event> 

<title><myxml : dbitem>title</myxml : dbitemX/title> 
<date><myxml :dbitem>date</myxml : dbitemX/date> 
<description> 

<myxml : dbitem>description</myxml :dbitem> 
</description> 

</event> 

</myxml : sql> 

</event_search> 

Fig. 3. Event page defined in My XML 



Figure 3 uses the <myxml : sql> and <myxml : cgi> elements to model the 
user input (i.e., the selected event) by means of a CGI parameter and a database 
query depending on the user’s choice. In addition, we provide the contents of the 
title, date and description fields in the resultset using the <myxml : dbitem> 
element. These values are processed and formatted further by the XSL pre- 
processing stylesheet. 

In the next step the XSL stylesheet shown in Fig. 4 adds layout information 
to the My XML input document. Note that the My XML stylesheet is imported 
to process the elements defined in the My XML template language. 
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<?xml version="l . 0"?> 

<xsl : stylesheet 
version=" 1 . 0" 

xmlns:xsl = "http://www.w3.org/1999/XSL/Transform" 
xmlns:myxml = "http://www.infosys.tuwien.ac.at/ns/myxml"> 
<xsl: import href ="myxml . xsl"/> 

<xsl: output method="html" indent="yes"/> 

<xsl : template match="event_search"> 

<HTML><BODYXxsl : apply-templates/></BODYX/HTML> 

</xsl : template> 

<xsl : template mat ch= " event " > 

<HlXxsl : apply-templates select="title"/x/Hl> 

<TABLE BORDER=" l"Xxsl : apply-templates/x/TABLE> 

</xsl : template> 

<xsl : template match="date"> 

<TR> 

<TD>Date : </TD> 

<TDXxsl : apply-templates/x/TD> 

</TR> 

</xsl : template> 

<xsl : template match="description"> 

<TR> 

<TD>Description: </TD> 

<TDXxsl : apply-templates/x/TD> 

</TR> 

</xsl:template> 

</xsl : stylesheet> 

Fig. 4. XSL stylesheet for the event page 



This stylesheet processes the input document and adds static elements such 
as the <HTML> or <BODY> elements. Furthermore, it formats the title of an event 
as heading and arranges the additional information in a simple HTML table. 
After having applied this stylesheet, all the layout information is included in 
the document, but the actual content is still missing. Thus, in the next step the 
My XML engine determines whether the document is static or dynamic. In this 
example, the content is the result of a database query which depends on user 
input. Thus the document is dynamic and triggers the generation of Java code. 
The generated source code is shown in Fig. 5. 

In the generated Java code, the HTML code of the web page can be accessed 
with the printHTMLO method. Thus a servlet class only has to instantiate the 
class and call its printHTMLO method to dynamically process a user’s request. 
As a result, the business logic remains completely independent of the content 
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public class event {. 

protected HttpServletRequest request = null; 
protected ResultSet SQLO = null; 



public event (HttpServletRequest request) { 
this. request = request; 

> 

protected String getCGIParameter (String paramName) { 
return request .getParameter (paramName) ; 

> 

protected ResultSet processSQLStatement (String select, String user, 
String pwd. String url. String driver) { 

// process database queries and return result set 

> 

public void printHTML(PrintWriter pw) { 
pw . pr int In ( " <HTML> " ) ; 
pw.printlnC <B0DY>"); 
printHTMLSQLO (pw) ; 
pw.printlnC </B0DY>"); 
pw.println("</HTML>") ; 

> 

public void printHTMLSQLO (PrintWriter pw) ■[ 
try { 

SQLO = processSQLStatement ( 

"SELECT title, date, description FROM VIF_EVENTS WHERE id = ’ " + 
getCGIParameter ("id") + " ’; ", "some_user", "some_passwd" , 

" jdbc:mysql://www.festwochen.at : 10000" , "org.gjt . mm. my sql. Driver") ; 
while (SQLO. next 0) {. 

pw.printlnC <H1>" + SQLO.getStringCtitle") + "</Hl>"); 
pw.printlnC <TABLE B0RDER=\"1\">") ; 
pw. print In (SQLO . getString( "title" ) ) ; 
pw.printlnC <TR>"); 

pw.printlnC <TD>Date:</TD>") ; 

pw.printlnC <TD>" + SQLO.getStringCdate") + "</TD>"); 

pw.printlnC </TR>"); 

pw.printlnC <TR>"); 

pw.printlnC <TD>Description: </TD>") ; 

pw.printlnC <TD>" + SQLO.getStringCdescription") + "</TD>"); 

pw.printlnC </TR>"); 

pw.printlnC </TABLE>"); 

> 

} catch (SQLException se) { 

// ignore exception and continue operation 

} 

> 



Fig. 5. Automatically generated Java code 
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and layout information encapsulated in the class. Even the handling of CGI 
parameters and SQL queries is hidden from the business logic which introduces 
another level of abstraction. With the integrated support for CGI parameters 
and database access we tailored the MyXML engine explicitly for the rapid 
development of layout independent web sites. 

The MyXML template language has several other elements besides 
<myxml : sql> and <myxml : cgi>. The <myxml : loop> and <myxml :multiple> 
elements allow the engine to repeatedly process parts of a document (e.g., 
for generating the list of items a user has stored in a shopping cart). The 
<myxml : single> element represents a user-defined variable, whose value is 
determined at runtime (e.g., the name of the user currently logged in). The 
<myxml : attribute> element, finally, can be used to dynamically set the at- 
tribute of another element (e.g., the src attribute of an HTML <IMG> or the href 
attribute of an HTML link). A detailed discussion of all these elements as well 
as additional attributes for the elements presented here is beyond the scope of 
this paper and can be found in [3]. 

Although the MyXML-based solution of the presented example includes more 
files and has a higher complexity than the original solution given in Fig. 1, it 
adds a great amount of flexibility, reusability and maintainability to the site. 
Using the strict separation of layout, content and business logic makes it easy to 
change or reuse any of the three parts independently from the others. All that is 
needed after an update is a regeneration of the affected pages using the MyXML 
template engine. Figure 6 shows the resulting HTML code for an arbitrary user 
request. 



<HTML> 

<HEAD> 

<TITLE>VIF - Macbeth </TITLE> 

</HEAD> 

<B0DY> 

<Hl>Macbeth </Hl> 

<TABLE B0RDER="1"> 

<TR> 

<TD>Date : </TD> 

<TD>07.03.2000</TD> 

</TR> 

<TR> 

<TD>Description: </TD> 
<TD>Shakespeare ’ s famous play.</TD> 
</TR> 

</TABLE> 

</B0DY> 

</HTML> 



Fig. 6. The resulting web page for a user’s request 
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4 Related Work 

There are many template-based products and tools in the market. Most of these 
tools are HTML oriented and are not XML aware. 

Coldfusion [1] and Webmacro [18] are two such tools that are popular among 
web application developers. Both of these tools provide custom scripting lan- 
guages and enable a rapid development of web applications. SQL and CGI 
functionality is supported but the tools do not provide layout independence. 
Coldfusion, for example, has built-in support not only for databases, but also 
for many other external resources such as directory and naming services, e-mail, 
etc. Coldfusion scripts consist of HTML-like constructs and the layout and logic 
information are not separated and are tightly coupled. Neither Coldfusion nor 
Webmacro supports an LCL separation that makes a web site flexible for mod- 
ifications. This is also true for most other template-based web tools currently 
found on the market. 

Rhythmyx [9] is a fully graphical development environment . The tool achieves 
layout independence by analyzing user-defined HTML templates and generating 
XML/XSL from it. Resources like pictures and template files are represented 
graphically on the screen. The developer visually connects the resources to build 
the necessary functionality. The use of XML and XSL is transparent to the user 
who may work using a common HTML editor. The disadvantages of this tool 
are that it is not platform independent and requires the use of an application 
server. Furthermore, web sites and applications can become very complicated and 
graphical tools that work without code have their limitations (e.g., the graphical 
display of a large number of dependencies is not always easy to understand) . 

The Apache Cocoon project [5] offers support for the layout-content-logic 
separation problem in web sites. Cocoon is a servlet program that is based on 
freely available XML parsers (e.g., OpenXML [8], Xerces [20], etc.) and XSL 
processors (e.g., KVisco [4], Xalan [19], etc.). Cocoon can be used for the real- 
time translation of XML files to HTML on a web server. 

Cocoon proposes two technologies for providing layout independent dynamic 
content in web pages: extensible Server Pages (XSP) [7] and Dynamic Content 
Processor (DCP) [6]. XSP is based on XML/XSL technology. Code is built into 
so-called logic sheets which are code generation style sheets. XSP generates com- 
pileable source code. DCP is simpler to XSP and is an interpreted language. It 
is therefore slower than XSP but is based on and is similar to XSP. 

My XML, as DCP, provides hooks for the business logic so the dynamic con- 
tent generation does not have to be defined in a logic sheet. Our approach is 
different from DCP in that My XML is not an interpreted language. In con- 
trast to XSP, Java code in MyXML is not integrated into XSL style sheets. 
The definition of the business logic is independent of the XML/XSL technology. 
MyXML also has built-in support for SQL and can be used to generate XML 
from relational databases. 
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5 Future Work 

We are currently applying and testing My XML in a web project. This project 
involves highly dynamic organizations with large sets of information. 

We plan to deploy My XML in another project to build form-based function- 
ality, to implement update mechanisms for the information in the database and 
to build static HTML pages from a central DBMS. 

We will integrate support for another language other than Java (e.g., Perl 
or PHP) and are considering to integrate XML Schemas [17] in addition to (or 
instead of) DTDs in the future. 

We are also planning to integrate XSL formatting objects [16] into the 
MyXML template engine. Formatting objects enable the creation of all sorts 
of data formats from XML data sources. By the use of formatting objects it 
could, for instance, be possible to generate a PDF file from parts of the web site. 

6 Conclusion 

Traditionally, the content and layout information are tightly coupled with the 
business logic of a web application. The clear separation of content from logic 
enables the web developer to build flexible web sites and supports ease-of-change. 
The responsibility can be divided among people responsible for the content, the 
people responsible for the layout and developers who are building functionality. 

Our contribution is a tool that facilitates web site development and mainte- 
nance by achieving a complete layout-content-logic (LCL) separation. Many web 
tools exist today that support the web developer in creating web sites. To our 
knowledge, only Cocoon XSP and DCP support the complete LCL separation. 

Template-based web engineering has been around for some time now. We 
have extended the existing concepts with the advantages of XML and XSL to 
achieve layout independence. We have designed and implemented a tool that 
is XML oriented and takes advantage of XML and XSL for the separation of 
content, layout and logic. 

MyXML enables the rapid development of layout independent, partially or 
fully database backed web sites. The MyXML language is simple to learn and use, 
yet powerful enough to have built-in support for frequently needed functionalities 
such as SQL queries and CGI parameters. 
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Abstract. As users learn new ways to explore the potential of Web, the 
information content in sites that are effectively explored tends to grow. 
Web technology is so powerful and flexible that the the limit of informa- 
tion management is usually reached before other technical restrictions 
and before we make the most of what Web can offer. In this paper we 
investigate the scenario where the content growth supersedes the capac- 
ity of information maintenance. We also present a strategy to deal with 
this problem without resorting to policies to constrain the growth of 
information content. By means of a case study we exemplify how a suc- 
cessful implementation was used to avoid the bottlenecks that contribute 
to content explosion. 



1 Introduction 

The Web grows. This assertion applies not only to the increasing number of 
new Internet sites popping up every day but also to the information content in 
successfully established Web installations. This is certainly a measure of how this 
technology matches our needs: it is powerful, flexible and easy-to-use. Indeed, 
it has proved to be so attractive that sometimes it spreads too fast, and faster 
than we can plan. 

While the boom in the number of sites — and the demand for bandwidth — 
has presented a challenge for the world telecommunication infrastructure, the 
second kind of growth, the inner growth, is surely a held for Web Engineering. 
Old modest and unpretentious “home pages” have evolved into huge institu- 
tional/corporative Web sites and the figure of the lonely webmaster that used 
to enjoy the spare time playing with the HTML (Hypertext Markup Language) 
editor was run down by highly specialised teams of professionals that work full 
time to attempt to keep up-to-date with the Web evolution. 

Rare are the cases, though, where the efforts made by the Web department 
are enough to handle the volume of work associated with the maintenance of a 
large Web site. No matter how hard the Web application designers try to foresee 
the work-flow they will have to deal with, it often supersedes the expectations 
and frustrates the best formulated management plans. 

The problem relies on the initial assertion: the Web grows. Indeed, this hap- 
pens not only because the amount of data to be made available increases with 
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time, but also due to the particular qualities of the Web. Being easy to use and 
nice to play with, it sounds like an invitation for users to publish all kind of 
information; being flexible, we keep on discovering new utilisation every day. As 
soon as we put everything we want in the Web environment, someone else comes 
with a new brilliant idea and contributes to increase the information content. 

We believe that such a situation must sound familiar to many of those who 
have ever been involved with the maintenance of an average-size Web site. In 
fact, given the dynamic nature of information systems (both time-dependent 
data and also the user needs), it is reasonable to expect this problem to occur 
in Web sites that are effectively utilised. 

2 Content Explosion 

Content explosion is very likely to occur as the result of centralised information 
maintenance. When faced to the inherent flexibility of the Web technology, these 
methodologies are challenged by the fact that there is often no other technical 
constraint to limit the Web utilisation and thereby the content maintenance 
becomes a probable bottleneck. 

In this scenario, Web maintainers come across an impasse: either to decide 
for limiting the content growth — and thus using less than the Web can really 
offer — or to vote for the information excellence and go towards the content 
explosion? While the former is usually preferable, once we recognise that prompt 
access to precise information is (more and more) a target to be reached and that 
Web provides us with the tools to go towards this goal, only by wasting the Web 
potential this decision can be made. 

The question that naturally arises face to this concerns is: is it possible to 
restrain content explosion without constraining content growth? By the light 
of Web engineering it is reasonable to assert that, if suitable techniques are 
adopted, almost any volume of content can be readily managed and maintained. 

It is the experience with one of such techniques that we intend to share 
with the other researchers by means of a case study which we believe to be a 
convenient way to introduce the methodology based on the decentralisation of 
content maintenance. 

3 Restraining Content Explosion 

During the last years the authors have been involve with the design and main- 
tenance of a complex real-world application which has provided all of us with 
invaluable lessons on management of large Web sites. The experience grabbed 
through practical investigation has allowed us to identify a suitable strategy to 
deal with the content explosion. 

3.1 Identifying the Problem 

Back to the days when the Web had just begun and was known only by a few 
people inside universities, a small team of researchers started to experiment with 
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this new and exciting technology at the Department of Electrical Engineering 
(SEE) at EESC/USPh 

When the first Web server of SEE was set up in 1995, that restricted group 
comprised of members of the Eaboratory of Computing Vision (EAVI), includ- 
ing the authors, was the only able to write HTME documents. Just as every 
one else who was involved with Web development elsewhere, we could already 
made out the the impact that the Web would have in the next few years. The 
forecast was precise and soon other faculty members became interested as well 
as the department office. One year later, EAVI Web group was overloaded by an 
overwhelming full-time Web publishing work. 

To summarise, the problems we were facing were associated to: 

a) Workload. Since the first Web days the information content of SEE Web site 
had been growing more and more up to the point where it was not possible 
for a small group of webmasters to carry out the work any longer. Even by 
bringing more collaborators into the Web group and by trying new HTME 
editors, the workload was clearly too high and we could not foresee how much 
and how fast it would increase. 

b) Work-flow. The initial purpose of SEE Web server was to publish institu- 
tional information for external visitors. Soon, however, SEE members become 
aware of Web potential as a channel for generic information exchange and 
they started to request the webmaster to this task. When the “Web culture” 
became broadly disseminated, the amount of time-dependent data to be daily 
updated turned the maintenance impossible to be managed in time. 

c) Management. Every time an item should be changed in the Web (such as a 
post-graduating student leaving the department), the whole site had to be 
checked for consistence and broken links used to take time to be found and 
fixed. As a result, webmasters who had shown enthusiasm with Web work 
before found themselves now discouraged to implement improvements or to 
perform changes beyond those that were absolutely necessary. 

d) Complexity. When more and more information areas were integrated into the 
Web, the complexity of the site increased proportionally. What was initially 
proposed to be a panel for institutional information had turned into a channel 
for administrative data-flow, distance learning platform, scientific database, 
internal events schedule among other purposes. Information modelling be- 
came to specific to be understood by the webmasters and, on the other hand, 
information producers (secretaries, faculty members etc.) where themselves 
not sufficiently skilled to deal with the growing complexity of Web technology 
— this subject has been properly discussed by [1]). 

e) Organisation. Both because the Web group was overloaded and because Web 
work is attractive, many people at SEE started to create their own HTME 
page and link them to SEE site without prior planning. Far from intending 
to constrain the colleagues’ initiative, the Web group did not discouraged 
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this procedure but, as result, the desired coherency of the site could not be 
achieved neither in visual harmony nor in terms of navigability. 



3.2 The Strategy 

In short we had a typical case of content explosion. It was when LAVI members 
decided to invest in this new domain not only as a side work but as an academic 
investigation activity. The Web group evolved into the Web Task Force and we 
started to design the first Web-engineered practical solution at SEL. 

In order to address the content explosion, we noticed that no matter how 
far we push the management limit, as soon as people learn new ways to take 
advantage of Web potential they will force us to reach it. Unless we accepted 
any policy of restriction to regulate the site growth — and thus diminishing the 
information content — we should find a way to overcome this situation. 

The key we found to solve the paradox is not let this limit to exist by elim- 
inating the bottleneck imposed by the finite capacity of the Web group. The 
strategy adopted by the Web task force was to delegate the responsibility of 
providing and maintaining the information to each generating agent, i.e. ev- 
ery information source would be in charge of publishing and maintaining the 
information of its own interest. Stated in other words, we concluded that the 
job of the Web task force should not be to generate content, but managing a 
decentralised information system. 

It was thus necessary to design a Web application to provide the functionality 
suitable to allow every faculty member, student, administrative official etc. to 
access the Web to update information content. Also, this application should offer 
tools to overcome the differences in HTML skills and to guarantee the consistence 
of the whole site. 

3.3 The Application 

The WebSEL [2], as such application was named, was based on a previous ex- 
periment which had been developed as part of an academic work. 

Basically WebSEL consists of a hypermidia database that provides an ab- 
stract model of SEL information scheme [3] . The conceptual model of relational 
entities was used to represent every information item that Web users might 
be interested in, such as faculty members, research areas, laboratories etc. A 
carefully navigational structure was then designed to connect those data rep- 
resentation through the database. Special attention was paid to formulate such 
connections presenting only context-dependent relevant links in order to avoid 
well known problems such as desorientation and cognitive overload. WebSEL was 
implemented mostly in pearl script language and it runs in a Linux workstation. 

When the WebSEL visitor requests any document from the server, one page 
is dynamically generated on-the-fly by a set of CGI scripts from the information 
contained in the database. Therefore, documents are always up-to-date — if the 
database is so — and consistent, once the database is manipulated by means 
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of WebSEL interface that enforces data integrity (if a given researcher’s page 
points to his laboratory, this laboratory’s page points to the researcher’s page). 

Our experience with a dynamic site has shown us that the proposed appli- 
cation should be flexible enough to allow this model to be modified and also 
extended if needed. Thus, WebSEL provides the Web administrator with fea- 
tures to help him to manipulate the database structure without resorting to 
code hacking. 

While the above introduction briefly describes WebSEL in the context of 
the hypermidia theory, this is not the focus of the present work (a much more 
complete and comprehensive reading my be found in specialised bibliography 
such as those on OOHDM [4] and RMM [-5]). The main concern is indeed on the 
information management from the point of view of Web engineering. 

The key element of WebSEL to implement the decentralisation of content 
maintenance is to make use of the Web as a two-way channel: the same facilities 
that the Web provides for releasing documents to a large group of users can be 
used to collect information from them. 

When one retrieves any Web page through WebSEL, one of the presented 
links points to the database updating tool. This is a Web interface (a standard 
HTML) through which the user can interact with the database to add and change 
information associated to that page. 

Evidently, users are allowed to change only the information that refers to 
themselves and according to a given hierarchy (every data is password protect 
to avoid unauthorised modification). 

The application’s home page (the only static HTML document) is shown in 
Fig. 1(a), while Fig. 1(b) and Fig. 1(c) illustrate the pages of a faculty member 
and the page of a research area respectively. Fig. 1(d) shows the page where the 
user can enter information about a laboratory in the database. 



3.4 Practical Results 

By the allowance of the administrative office, WebSEL was officially released in 
July of 1998. Since them SEL Web site has been intensely utilised by people 
with quite diverse needs and knowledge. 

With regards to the external visitors impressions the site has been clearly 
approved as it may be concluded by the guest book where the answers they 
provide to a brief survey revels that the visual and navigational design, as well 
as the content itself, matche their expectations. Internal users, although more 
exigent, have also approved the work and have continuously contributed with 
suggestions and requests for new features. 

The decentralisation strategy was successful in its attempt to restrain content 
explosion, as it may be made out from the numbers. 

Before WebSEL was released the whole SEL site was was comprised of about 
100 HTML pages. While this number by itself does not serve as measure of the 
size of the site, maintaining the highly dense information content up-to-date 
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(c) 



(d) 



Fig. 1. Screenshots of WebSEL interface 



and consistent with the daily changes was very difficult. It was clear that if 
the number increased by a factor or two, this procedure would become nearly 
impossible to be satisfactorily performed. 

By January of 2000, though, with WebSEL in broad operation, visitors of our 
Web site may browse more than 500 pages^ — replete of detailed information 
on all department staff including faculty members, secretaries, researchers, all 
graduating and under-graduating disciplines, research fields, laboratories, insti- 
tutional data, schedule of activities etc. — daily updated (hourly updated in 
some cases) and the maintenance work is not as overwhelming as it was before. 

^ Pages are dynamically generated on-the-fly from a unified database 
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Indeed, if all the students are given a home page as we intend to do, we ex- 
pect the site to reach 1000 pages or more — and this number does not scare 
webmasters. 

The Web publishing made easy was fundamental to get all SEL members in- 
volved into the content maintenance process. Doing without special skills, people 
who had been restrictive until them were finally motivated to make use of Web- 
SEL. 

The job of Web task force has not finished, though; it has just moved from 
content maintenance to the application maintenance. Indeed, that was what 
we expected and the workload have got down to an acceptable level. With the 
slogan “you are your webmaster” the centralised information generation which 
had been delegated to the Web group was finally shared with other users, offering 
us concrete reasons to consider that WebSEL designers’ aim was attained. 

4 Conclusions 

We believe that this case study can serve as a convenient example of a typical 
case of content explosion. By means of a suitable methodology based on the 
decentralisation of content maintenance we have also suggested how it is possible 
to restrain content explosion without constraining content growth. As a matter 
of illustration, we made use of WebSEL to exemplify the implementation of such 
methodology and presented positive results. 

The contribution that we intend to offer with this paper, though, is not with 
regards to a particular technology but rather to discuss about the strategy to 
deal with content explosion, a situation that is very likely to come about in 
typical institutional/corporate sites nowadays. 

Pondering on the dynamic nature of information systems, the potential of 
Web as a channel for information exchanging and the limits of centralised man- 
agement which is usually reached before we make the most of the Web, it is 
interesting to ask whether information content must be restricted up to a max- 
imum secure level or it may be allowed to grow freely. 

By means of the decentralisation of content generation and of delegating 
the responsibility of data maintenance to its own source of information, it is 
possible to overcome the bottleneck imposed by centralised methodology. To the 
Web task force remains the Web application maintenance, which is indeed Web 
engineering. 
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Abstract. All texts and, more specifically, their electronically 
distributed variations are multimodally articulated, integrating 
language, spatial arrangements, visual elements, and other semiotic 
modes [1]. Users usually have varying preferences regarding 
multimodal access representations or differing capacities to make use of 
them [2]. The number of alternatives provided by paper-based media is 
inherently limited. Adaptive hypertext applications do not share this 
limitation. This paper reviews popular methods and classifies them into 
three categories of information and their corresponding interface 
representation: Content of documents, primary navigational system 
(comprising links between and within these documents), and 
supplemental navigational systems (e.g., index pages, trails, guided 
tours, local and global maps). 

Keywords: Adaptivity, Annotations, Navigational System, User Modeling, 
Classification 



1 Introduction 

Despite technological and organizational changes, the real information needs of real 
customers provide a uniform purpose and guide for Web engineering. The role of 
customers is getting transformed in the virtual marketspace. The emergence of the 
World Wide Web has eroded brand loyalty and shifted the bargaining relationship 
between many industries and their potential buyers. Adaptive Web applications that 
increase customer delivered value have been suggested as a flexible solution. 
However, many prototypes that incorporate adaptive components are developed 
without a clear model of the components’ functionality - i.e., their intended type of 
adaptivity. This paper provides a conceptual guideline for this process. It focuses on 
three categories of information and their interface representation to optimally support 
the user's preferences and capacities: Content of documents, primary navigational 
system comprising links between and within these documents, and supplemental 
navigational systems such as index pages, trails, guided tours, or interactive site 
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maps. The first category, content-level adaptation, is frequently used to tackle the 
problem of heterogeneous user knowledge. Summarizing, for example, can be useful 
both as an introduction and to condense a series of previously presented scriptons. 
Link-level adaptation is used to provide navigation support and prevent users from 
following paths irrelevant with regard to their current goals [3]. Supplemental 
navigational systems operate on the meta-level and generate various overviews of the 
system as a whole. They enable users to verify their location, c onfirm th e context of 
the current document, or access the own interaction history JTable 1| summarizes 
adaptive presentation techniques and their applicability to the three categories 
described above. The question marks symbolize restrictions that have to be kept in 
mind when applying a particular technique: (a) Sorting of non-contextual links, for 
example, makes their order unstable and tends to decrease the system's usability [4]. 
Although being designed for the benefit of the user, a constantly changing system can 
bring confusion and decrease confidence [5]. (b) Sorting of indexes can be equally 
problematic, as their structure might be rendered obsolete by that operation, (c) 
Hiding of contextual links can only be achieved via stretchtext, a technique that 
enables the user to collapse and uncollapse optional chunks of text (see the section on 
content-level adaptation). 



Table 1. Adaptive content-, link-, and meta-level Web presentation 



Presentation 

Technique 


Content 


Primary 

Navigational System 


Supplemental 
Navigational systems 


Contextual 

Links 


Non- 

Contextual 

Links 


Indexes 


Local and 
Global 
Maps 


Summarizing 


X 










Sorting 


X 




7 


7 




Highiighting 


X 


X 


X 


X 


X 


Hiding 


X 


7 


X 


X 


X 


Direct Guidance 




X 


X 


X 


X 


Annotation 


X 


X 


X 


X 


X 



The application of both highlighting and annotations is rather unproblematic, 
independent of presentational category. Highlighting facilitates access of complex 
information spaces and potentially increases the application’s interactivity - e.g., by 
raising curiosity, discussion, or comments from the user [6]. To acknowledge the 
importance of annotations as a powerful and very flexible presentation technique, the 
following section describes its conceptual foundations and areas of applicability. The 
other three techniques - sorting, hiding, and direct guidance - will be discussed 
thereafter. It is important to keep in mind that these technologies do not contradict 
themselves but rather provide a number of synergies when used in combination. 
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2 Annotations 

Most scholarly articles and books exemplify explicit hypertextuality in non-electronic 
form by using a sequence of numeric symbols to denote the presence of footnotes, 
signaling the existence and location of subsidiary texts (explanations, citations, 
elaborations, and so forth) to the main document [7, 8]. As far as printed material is 
concerned, however, the reader rarely is exclusively attracted by the footnotes and 
“becomes fascinated with the nonlinearity and incompleteness of such a collection of 
fragments, just as one does not give up a novel to start reading the phone directory” 
[9]. Annotations are quite similar to the concept of a footnote in traditional texts, but 
usually are added by an author or, collaboratively, by a group of authors different 
from the producer of the main document. Common interface representation of 
annotations include textual additions or various visual cues such as icons, 
highlighting, or color coding for mapping attribute values to visual representations. 

Footnotes in printed material usually are presented together with the text. 
Hypertext annotations are considered less intrusive because they are not shown unless 
the reader activates them [10, 11]. Another distinction refers to the process of link- 
based reference, which can continue indefinitely in distributed hypertext systems. 
Writing in layers is both possible and tolerable, and accessing multi-layered, multi- 
dimensional structures is facilitated by sophisticated user interfaces [7]. 



2.1 Theoretical Foundations 

Already in the Middle Ages biblical, legal, and medical texts have attracted extensive 
commentaries in the form of annotations. As far as traditional print media are 
concerned, annotations either remove obscurities or manifest external sources [12]. 
Extending their concept to hypertext, the extra functions of user participation 
provided are seen as liberating and empowering by some and oppressive and 
authoritarian by others [17]. Contradicting obsolete concepts of authorship decisively 
shaped by romantic theories of the solitary genius, a model we have come to associate 
most strongly with the figure of William Shakespeare [13], hypertext annotations 
allow authors and readers to incrementally augment Web documents. They 
interactively create an ancillary structure that captures some aspect of the meaning of 
those documents [14]. The bridging of the perceived gap between production and 
consumption of symbolic artifacts questions one of the most profound ideological 
divides in the social reality of modern Western society and therefore became a highly 
contested ground. Many traditional thinkers feel cultural achievements to "be 
threatened with oblivion by the brave new world of technology" [15]. 

By allowing users to explore the multilayered hierarchical space between creativity 
and passivity and to become secondary authors within the constraints laid down by 
the primary author, annotations draw attention to the fact that there have been 
previous readers [16, 17]. Especially for historical documents which were written, for 
example, under different social circumstances actual, potential, and possible readers 
have to be distinguished when “marrying the daily knowledge of the past with the 
partial ignorance of the present” [12]. Similar concerns arise when addressing groups 
of readers with heterogeneous background and expertise. 
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2.2 User Modeling 

In addition to manually adding commentaries, automated systems annotating 
documents with information retrieved from search engines, databases, or newsgroups 
are useful in contextualizing the document to the reader's interest stored in a user 
model. A (set of) user models should be central to the design pro cess and su bstantially 
influences the basic functionality of Web information systems, pigure l| depicts the 
conceptual structure of a Web information systems with embedded user modeling 
component [18]. Individual user profiles are typically stored in a separate database. 
Balancing occasionally conflicting requirements stipulated in the user and (system) 
task models, the inference engine customizes the application structure on the basis of 
the application model and immediately generates the Web documents requested by 
the user. 

User — ► Interfaoe — ► Application 

(Human) (Browser) (VM^ 



Figure 1. User Modeling Component [18] 

Experimental results confirm that even a minimalist user modeling component can 
improve the subjective measure of user satisfaction at low cost and neglectable 
commercial disruption. Finin formulates a number of questions to describe the space 
of existing user models, which spans four dimensions [19, 20]: 

• Who is being modeled? The model's degree of specialization basically 
distinguishes between individual users versus classes of users. Advanced tools for 
defining and managing user models will also have to address the question of single 
identity versus multiple identities. The customer's demand for privacy and 
flexibility will require additional functionality to switch between multiple 
identities, encompassing sub-identities for personal, corporate, or even anonymous 
Web access [21]. 

• What is being modeled? Gender, age, economic status, ethnic origin, educational 
background, professional experiences, and language belong to the primary 
audience attributes in media research [22], but not all of them are readily available 
in Web-based environments. Thus, user models tend to focus on information about 
the online lifestyle of a particular user, comprising her identity, email address, 
shopping choices, hobbies and interests, preferred methods of payment, 
subscription services, and so forth. In many cases it is difficult to determine which 
elements have to be incorporated into the model, and what level of abstraction 
should optimally be chosen. 



User Profiles 

(Database) 






Inference Engine 

User Application 

Model Model Model 
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• Model acquisition and maintenance: Many of the approaches embodied in research 
prototypes are far too complex to be of practical use in commercial applications. 
Even integrated approaches of simultaneously gathering implicit and explicit 
customer feedback have certain limits as far as the granularity of information is 
concerned. Among the concerns are the performance overhead inevitably incurred, 
time and expenses to build up and maintain the user modeling component, and 
often unproven advantages of the added functionality [18]. 

• Why is the model there.^ User models help understand the user's information- 
seeking behavior, offer general support and advice, provide adaptive output to the 
user, and obtain feedback in a consistent way. 



2.3 Annotation System Architecture 

The ubiquity of Web content and constraints regarding the capability and efficiency 
of the current Web infrastructure motivate the need for lightweight, efficient, non- 
intrusive (preferably transparent), platform-independent, and scaleable Web 
annotation systems that usually are based on abstract intermediary architectures. 
These architectures serve as paratextual expansion joints in the client-server 
co nnection b eing able to customize communication on a per interaction basis [23]. 

t^igure 2 1 conceptualizes an annotation system architecture with the interceptor 
tapping into a client-server interaction, triggering the annotation process, and 
invoking the composer to produce the annotated content. The annotation sets are 
retrieved from the annotation repository in accordance with the document, user 
model, and context. In case of hierarchical user models, this process may require the 
merging of private, group and public annotations. Vasudevan and Palmer distinguish 
between stylistic, versioned and semantic composers [23]. While stylistic composers 
only locate the annotation sets, anchor them, and choose a customized presentation 
scheme to visually distinguish them from the document content, versioned composers 
take the versioning semantics of both the document and the annotation sets into 
account. The most sophisticated approach, semantic composing, does not exclusively 
rely on explicit authoring but allows knowledge-based processing on the basis of 
annotation microlanguages. 
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Figure 2. Conceptual Annotation System Architecture [23] 
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3 Content-Level Adaptation 

A variety of methods may be used to increase the usability of complex Web 
information systems. Nielsen classifies them into the following categories of content- 
based adaptation, which all benefit from or require an accurate user model [24]: 
Aggregation (showing a single unit that represents a collection of smaller ones, both 
within and across Web information systems), summarization (representing a large 
amount of data by a smaller amount; e.g., textual excerpts, thumbnails, or sample 
audio files), filtering (eliminating irrelevant information), and elision (using only a 
few examples for representing numerous comparable objects). 

Technically, the term content-level adaptation refers to all different forms of data 
embedded in hypertext documents. In practical terms, however, almost all prototypes 
and implemented systems concentrate on textual segments, neglecting visual and 
audiovisual forms of data. A good indicator for the granularity of adaptivity is the 
average length of textual segments. Scope for contextual variability is introduced by 
establishing an independent format for storing these segments, and by incorporating a 
flexible mapping algorithm to provide various types of traversal functions [25]. A 
large number of very short textual elements ensures maximum flexibility and 
(potentially) a very exact match between the presented information and the user's 
actual needs. However, the required efforts to maintain the database as well as the 
rule set significantly increase with the number of distinct elements. One of the simpler 
but nevertheless quite effective low-level techniques for content adaptation is 
conditional text (also referred to as canning or conditionalization), which requires the 
information to be divided into several chunks of texts [4, 25]. Each chunk is 
associated with a condition referring to individual user knowledge as represented in 
the user model. Only those chunks appropriate for the user's current level of domain 
knowledge are considered when generating the document. The granularity of this 
technique can range from node-level adaptivity (i.e., storing different variations of 
whole documents) to very fine-grained approaches based on sentences or even smaller 
linguistic units. It is generally not easy to specify the optimal length of textons for a 
specific application, which is to a large extent determined by the trade-off between 
granularity and maintenance intensity. 

The term stretchtext denotes a higher-level technique. The idea is to present a 
requested page with all stretchtext extensions that are non-relevant to a particular user 
being collapsed. While reading the document, the user is able to collapse optional 
chunks of text and uncollapse the corresponding terms whenever s(he) desires. 
Applications of stretchtext can be categorized along two dimensions [26]: Placement 
of the text relative to the original, either at the beginning (prefix) or the end 
(appended), embedded inside the old lexia or completely replacing it; and granularity, 
understood as the average length of lexias, usually based on graphical forms such as 
words, sentences, or paragraphs. 

By activating and closing stretchtext extensions, the user creates summary and 
ellipsis, by means of which the articulated discourse is shortened [16]. Consequently, 
one of the main advantages of stretchtext is that it lets both the user and the system 
adapt the content of documents, giving the user the possibility to override the 
information stored in the user model. Similar concepts were already found in Augment 
and several early text editors at Xerox PARC [10]. 
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The most powerful content adaptation technique is based on frames and 
presentation rules where slots of a frame can contain several different explanations of 
the concept, links to other frames, or additional examples. Usually a subset of slots is 
presented in order of decreasing priority [3]. Research on natural language generation, 
which aims to produce coherent natural language text from an underlying 
representation of knowledge [27], provides valuable insights for implementing such 
an advanced content adaptation technique. The generation process requires both text 
and discourse planning. While text planning comprises all the choices of what to say 
in the document, discourse planning determines the optimal sequence of textons in a 
coherent way. Ensuring coherency is a complex task and may be achieved by 
selecting a prepared discourse plan from an appropriate repository, and instantiating 
this plan using textons from the knowledge base [28]. 



4 Link-Level Adaptation 

"The electronic age is the age of globally dispersed capitalist economies, of 
transnational corporations, of complexly mediated and hybridized cultural 
experiences, and, above all, of distance ... presented as proximity" [13]. While urban 
planning and social anthropology use the terms proxemic and dystemic to describe the 
closeness relationship of people and spaces, Web user proximity can be regarded as a 
function of the actual distance and the cognitive distance between the person and the 
(virtual) space. As such it is concerned with the user's mental state and the perceived 
relationship to the history-rich objects. 

Adaptive hypertext systems may be more or less proxemic based on the quality of 
link-level adaptation techniques, which rely on how well the users' past experiences 
and knowledge are represented in the user model. Basic link-level adaptation 
techniques, which usually aim at decreasing the cognitive overload [29] caused by 
complex Web information systems, can be grouped into four categories: 

• Providing relevant starting points in the information space [5]. 

• Influencing link perception. While it is possible to either show a menu of available 
destinations or to open a separate window for each of the destinations, adaptive 
architectures are able to automatically reduce the set of available destinations 
according to the current interests of the user, or even choose one particular link. 
Once numeric interest values have been computed, it is relatively straightforward 
to transform that knowledge into interactive visualizations that consider the user’s 
current interests with respect to their current location. In his paper on fisheye 
views, which belong to the category of distortion-oriented visualization techniques, 
for example, Furnas proposes a formal degree of interest metric [30, 31]. This 
metric enables adaptive systems to identify and display only those parts of the 
document tree that are of greatest relevance to a particular user. The degree of 
interest (DOI) in a is given by DOI(a\.=b) = API(a) - D( a, b) where b is the current 
point of focus, API(a) the a priori importance (= preassigned values to each point 
in the structure under consideration), and D(a,b) the distance between point a and 
the current focus [32]. There are a number of mechanisms to convey and visualize 
the presumed relative importance of a document to the user: Hiding, on the one 
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hand, actually restricts browsing to smaller subspaces for inexperienced users. 
Dimming, on the other hand, decreases the cognitive overload as well, but leaves 
dimmed links still visible - and traversable, if required. Highlighting is the most 
popular mechanism. Its attributes include boxing, blinking, color (hue, saturation, 
brightness), texture, and reverse video. As lateral and temporal masking negatively 
impact the other attributes, color coding remains the method of choice for most 
applications [33-35]; see for example the Personal WebWatcher, a prototype for 
customized browsing [36, 37]). More specifically, using color for highlighting 
decreases search times if there is at least a chance of 50 percent that the highlighted 
element is the target. This quality is termed highlight validity, which can be 
increased by providing graphic link indicators for localizing relevant links [38]. 

• Sorting presents recommendations in the form of a sorted list of links and thereby 
transforms the complex problem of advice giving into the much simpler problem 
of rank ordering a list [39]. Occasionally, the non-stable order of links introduced 
by adaptive sorting may lead to incorrect mental maps, especially in the case of 
inexperienced users [4]. To avoid confusion and optimize the interface 
representation, primary and recency effects should thus be taken into account when 
compiling such a list. The first and last words of a given list usually have the 
highest impact on the user's memory. Recall for the central words (or links) is 
worse because some never make it into short-term memory due to its limited 
capacity, or because some have already decayed from it. "Roughly seven (plus or 
minus two) words can be placed in short-term memory without exceeding its 
capacity. Left alone and unrehearsed, a single word will persists in some form for 
up to 18 seconds” [33]. 

• Annotating (history-based versus user model based) and semantic link labels, 
which can be regarded as a subcategory of annotations from a theoretical 
perspective. 

The following paragraphs focus on the last category, annotations, and the 
importance of more advanced mechanisms for specifying semantic link labels within 
Web documents. Due to limitations in the current infrastructure of the World Wide 
Web, most applications lack mechanisms to show mutual dependencies and co- 
constitution among possible categories of thought [11]. Many systems do not provide 
links that allow the user to anticipate where the link will lead them, or to clearly 
distinguish them from other links located in the vicinity [40]. Semantic link labels 
address this problem of inexpressiveness [14] by conveying information about a link's 
purpose and destination, its relevance in the current context, its creator, or its date of 
creation, making it possible for users to evaluate whether to follow a link without 
having to select it [41]. By providing the user with meta information about the 
relationships between documents, they increase cohesion and help maintain context 
and orientation in non-linear Web presentations. 

Current Web browsers only support history-based signaling whether a link has 
already been followed by the user. This is only a very basic mechanism, not tapping 
the full potential of hypertextual navigation. It can easily be extended by including a 
categorization of hypertext links. Labeling them in a consistent manner according to 
their type comprises the following categories: (a) intratextual links within a document 
(related content marked by anchors, annotations, footnotes, citations, figures, etc.); (b) 
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intertextual links between documents of a particular Web information system; (c) 
intraorganizational links to other systems of the same company; (d) 
interorganizational links to systems of other corporations within the same business 
ecosystem; (e) external links to sources of information different from (a)-(d), located 
outside the corporation's business ecosystem. 

Combining these categories with explicit annotations (e.g., numerically or via color 
coding) of the percentage of people who followed each of the links off the current 
page further increases the system's usability [42]. This mechanism may also 
incorporate an adaptive component, considering only those links relevant to a user 
and computing percentages exclusively for this subset. Trigg presents a rather 
elaborate taxonomy of 75 semantic link labels to distinguish among different forms of 
relationships between nodes [43]. The taxonomy comprises both commentary link 
types, which are basically equivalent to the concept of annotations, and regular link 
types such as abstraction, example, formalization, application, rewrite, simplification, 
refutation, support, data, and so forth. Although allowing users to define new link 
types represents an extensible alternative to such a fixed set of link types, there are 
three main reasons for not providing such a facility: 

• Explosion of link types. Without restrictions, users would possibly flood the system 
with unmanageably many new link types. 

• Reader confusion. It seems unlikely that the name of a link type originally chosen 
by its creator would be sufficient to convey its meaning to future readers. This in 
turn could lead to misuse of the new link type by later critiques. 

• System confusion. The semantics of some link types are partially understood by the 
system. Creators of new types would have to somehow define the type and its 
special features to the system. 



5 Meta-level Adaptation 

While the use of the term navigation itself represents a metaphor, Powell et al. 
distinguish between three types of navigational support: textual, visual, and 
metaphorical [44]. Covering all three categories, this section will focus on 
supplemental navigation systems (in contrast to the primary navigational system 
comprising contextual and non-contextual links as discussed in the previous section). 
Supplemental navigation systems are used to locate and interpret a given item of 
information, providing full context by verifying the relation between different items 
and the virtual spaces surrounding them. They should include mechanisms to signify 
the user’s current location, and to retrace her individual steps. Such systems, however, 
become more than mechanisms to navigate a virtual space; they become crucial 
textual elements themselves, replete with their own interpretive assumptions, 
emphases, and omissions [8]. To support the different preferences, levels of technical 
knowledge, and cognitive styles of their users, advanced Web applications usually 
employ a combination of the following supplemental navigation systems: 

• Site maps. In contrast to handcrafted representations of hypertext structures, 
automatically generated site maps are composed on the fly by the underlying 
system according to the system’s topology and a set of pre-supplied layout rules. 
Site maps generated with prevalent modeling tools such as the WebArchitect, 
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SchemaText, Microsoft's FrontPage, Adobe's GoLive, or Macromedia's 
Dreamweaver to name a few, provide some authoring-in-the-large functionalities 
but often lack semantic richness and advanced interactive features [45, 46]. 

Site indexes and tables of contents add a second-level structure (often alternatively 
referred to as thesaurus, semantic net, domain knowledge, or index space) on top of 
the basic document hypertext structure in order to provide more flexible and 
intelligent navigation [47]. 

° Site indexes are alphabetical lists of terms that are similar to the manually 
created index usually found at the back of books. They are useful in that they 
are not constrained by the site's hierarchy, in contrast to tables of contents or 
site maps [48]. Compared to other supplemental navigation systems, they 
reference very granular bits of information. Their high number of entries and 
relatively flat hierarchical structure are two additional attributes that 
differentiate site indexes from tables of contents. They are capable of 
summarizing content, but usually fall short in adequately presenting 
structural information [35]. 

° Graphically enhanced tables of contents do not require the user to parse 
textual path information. The WebToc prototype, for example, automatically 
generates an expandable and contractible table of contents, indicating the 
number of elements in branches of the hierarchy as well as individual and 
cumulative sizes. Color mapping is used to represent file type (text, images, 
audio, and other), while the length of the bars signifies their overall size [49]. 

Direct guidance. The tension between the content of a document as freely 
interpreted versus the content relevant for a specific situation (e.g., describing 
requirements and steps to complete a certain task) creates uncertainty. Guided 
tours in the sense of predefined trails as originally suggested by Vannevar Bush in 
1945 [10, 50] provide a good example of indexicality and presentational 
linearization. They try to overcome the uncertainty mentioned above by connecting 
to the next recommended step. They can be understood as virtual discourse that is 
produced by an absent author [51, 52]. 

Retrospective access via: 

° Chronological or parameterized backtracking to relate the current context to 
previously covered information; refer to Nielsen for an overview of 
conceptual backtracking models [10] ; and 

° Topic or access histories /5i/. 

Hierarchical bookmark lists (also called Hotlists or Favorites) that enable users to 
tag elements perceived to be of long-term importance so that they can directly 
return to a particular document without having to remember and retrace the 
original pathway [8]. 

Access to search engines and databases using queries (full text, via keywords, or 
based on personal conceptual descriptors; [47]) partially eliminates the need to pre- 
organize the information space in a hierarchical or alternatively structured way. It 
is particularly useful when the user precisely knows the nature of the information 
she is seeking [54]. 
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6 Conclusion 



This paper introduced a classification framework for adaptive Web representations. 
Adaptive components extend an information system's functionality and replace 
general-purpose documents that are written according to a wide audience model and 
the user's anticipated needs. While being motivated by a user-centered design 
perspective, the question goes beyond the scope of interface design or document 
presentation. Current efforts include the development of Web-based architectures that 
take advantage of adaptive system behavior. Emphasizing the role of annotations, the 
described framework comprises three categories of information, and their 
corresponding interface representation: Content of documents, primary navigational 
system including the links between and within these documents, and supplemental 
navigational systems such as index pages, trails, guided tours, or overview maps. 
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Abstract. The expanding role of the Web as a content and applications 
deployment platform and the appearance of new computing paradigms, 
such as thin-client computing, require now more than ever the intro- 
duction of concrete development frameworks. Although new ap- 
proaches, technologies, tools, commercial applications appear daily, 
limited guidelines or frameworks exist that can assist Web developers 
in selecting the proper methodology and tools for the design, imple- 
mentation and maintenance of flexible Web content and applications. 
Our work, triggered from our experience in implementing the Web 
presence of several large Greek Governmental organisations, attempts 
to address the major current and forthcoming problems that Web devel- 
opers face. We propose a framework (RDF/XML based) that will act as 
a malleable development support environment, incorporating specific 
guidelines, which Web developers should always consider. The primary 
goals are achieving scalability (modular, component-based architec- 
ture), re-usability and technology independency in Web development. 
We focus on hypermedia content and applications. 



1 Introduction 

In this introductory section, we examine the concept of the Web application domain, 
as seen from the aspect of the developer, and the typical components of Web applica- 
tions. The Web developer, once seen as a simple document author, nowadays is typi- 
cally responsible for building complete Web sites (and sometimes distributed online 
environments), incorporating different Web applications, technologies and varied 
content. These applications can be distinguished into two basic categories, Web Hy- 
permedia Applications and Web Software Applications. 

A Web Hypermedia Application is the structuring of an information space in con- 
cepts of nodes (chunks of information), links (relations among nodes), anchors, ac- 
cess structures and the delivery of this structure over the Web. Therefore, the devel- 
oper of such applications is often asked to develop novel hypermedia applications 
with new content, explicitly for the Web, port or import existing information from 
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unstructured documents or other hypermedia systems into the Web or map (dynami- 
cally or statically) information sources stored within a logical, structured space (e.g. a 
database) into a collection of Web pages. 

A Web Software Application is any “classic” software application that depends on 
the Web, or uses the Web infrastructure, for its correct execution. In most cases, the 
hypertext model is not appropriate for modeling the content of these applications, and 
other models (e.g. Object-Oriented, Entity-Relational) are employed. This category 
includes, among others, legacy information systems accessible through online gate- 
ways via a Web browser, such as databases (DBs), booking systems, knowledge- 
bases, numerical software, etc. 

In accordance with the paradigm of on-line computing, with either thin- or full- 
clients, we consider that a generic Web application consists of three main parts: (a) 
Information (content/data) Structure, inch analysis, design, structure, storage, man- 
agement, reuse of data, (b) Application Logic, namely static and dynamic navigation, 
filtering, personalization (in Web Hypermedia apps) or more generic processing (in 
Web Software apps) and (c) User Interface (UI). 

This three-tier architecture can also be expanded to a multi-tier model, where the 
Application Logic is implemented by different layers of software. This approach of 
isolating content, application logic and interface is a tried and true one, and is rec- 
ommended for most Web applications. Based on it, our work proposes an abstract 
architecture of modules that should be provided to the developer in order to support 
the life-cycle of each layer, emphasizing on easy maintenance, modularity, extensi- 
bility, re-use and interoperability of information or application components. Special 
attention is given to using a data-centric, rather than application-centric, development 
process. 

For the data layer, we examine issues of effective content modeling (inch devel- 
opment, management) of data resources, in order to construct “smart”, human- and 
machine-readable (self-describing) Web resources, accessible from modern Web 
applications (content pool). We consider any kind of content, regardless of (i) com- 
plexity of the content structure (ii) media type (iii) potential application domain and 
(iv) type of Web application where the content may be used. 

For the application logic and UI layers, we shall focus on Web Hypermedia appli- 
cations. We examine application and UI modeling methodologies, that will allow the 
developer to implement apps that can transparently access and exploit the resources 
described above, as well as store their own application resources (content or logic) 
back into the data layer, enabling reuse by other Web applications. 

On the issue of Web Software Applications (app logic, UI), we consider that this is 
rather a Software Engineering problem. However, in this case also common guide- 
lines of Web Hypermedia application development can be used, such as: 

• Focus on modular implementations, preferably using components 

• Easily configurable query and results rendering mechanisms 
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2 Current Efficient Web Development 

We consider that three main parts constitute a Weh application: Information Struc- 
ture, Application Logic and User Interface. This disjunction is clear in Figure 1, 
which illustrates the current Applications and Resources of the WWW. We distin- 
guish four classes of Web applications, according to the implementation of these three 
parts. The applications of class APPl (a typical example is a collection of static 
HTML pages) are the worst regarding maintenance issues and re-use. For instance, if 
the developer wants to modify any of the three parts of the application, she/he has to 
modify every single resource. Thus, HTML authoring (or converting resources to 
HTML pages) is the most inefficient way of developing Web Hypermedia applica- 
tions. However, most of the current Web developers still develop that way. At APP3, 
information structuring is embedded into the application logic, and so much of the 
knowledge about the information structure, which is utilized by the application logic, 
is lost once the application has been constructed. On the other hand, applications of 
type APP4 that separate information structure, application logic and interface are 
providing friendly maintenance and high reusability. 



1 WEBDEVEL- | 
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DATA RESOURCES (Raw Data or App' 
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Fig. 1. Current Web Applications and Resources 



For the development and management of Web Hypermedia Applications, the de- 
velopers, in the best case, are using primary Data Resources, existing “Theoretical” 
Resources & Tools and they follow a Development & Maintenance Procedure. 
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2.1 Data Resources 

As stated, the most common task for a Web developer is the development of a Web 
site, containing a set of Web applications. For the development of these applications, 
the developer is usually working on a set of heterogeneous and probably distributed 
Data Resources that have either been generated by other applications or are produced 
for the specific application needs. Such data resources are DBs, documents, FITML 
pages, multimedia files, 3D worlds files, programs, etc. The basic problem of the data 
resources from other applications is that they were designed, implemented or ma- 
nipulated according to their application context. We call them “application-oriented” 
resources and they are difficult to be re-used. They usually need remodeling or trans- 
formation to the specific application needs. This requires much effort and time, and 
the result is again a collection of application-oriented resources. 




Fig. 2. “Theoretical” Resources and Tools in current efficient development 



2.2 “Theoretical” Resources and Tools 

Apart from the Data Resources, the developer can use existing “theoretical” resources 
(methods, models, etc.) and their corresponding tools (if available). Figure 2 illus- 
trates how the developers can use the “theoretical” resources and their tools in order 
to implement a Web Flypermedia Application based on primary Data Resources. 

The “theoretical’ resources include: 

• Conceptual Models: For the design of the three parts of the Web applications, 
the developer utilizes well-known and adopted conceptual models that mainly 
derive from Software Engineering and DB concepts, like Object-Oriented (00), 
Entity-Relational (ER), Labeled Directed Graph (LDG) and Component-Based 
Design (CBD). 

• Development Methods: Design methodologies for the construction of the con- 
ceptual, navigational and interface designs of the three parts and how they are 
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related. Existing Development Methods are based on and incorporate a certain 
conceptual model. 

• Design Patterns [1,2] for hypermedia applications development. Design pat- 
terns address the reuse of design experience at any of the several aspects of de- 
sign and development: conceptual, navigation, and interface modeling; rheto- 
rics; implementation, including specific development environments (e.g, Web); 
development process. 

• Development Process Models'. Various ways that the Development Methods can 
be applied in order to increase their performance. By process we mean activities 
that are carried out, the relationships between these activities, the resources that 
are used, the results that are created, etc. Such Process Models are the waterfall 
model [3] and the Spiral Model [4] in Software Development and IMPACT-A 
[5] in hypermedia development. Testing and Quality Assurance stages are in- 
cluded within the models. 

• Technologies'. Any technology that addresses data representation for stor- 
age/management/exchanging (e.g. XML, RDF, RDBMS, etc.) and application 
logic or interface implementation and management (e.g. CSS/XSL, HTML, 
DHTML, Java, Cookies, Plug-Ins, ASP, ISP, etc.). 

Some of these “Theoretical” resources may have the support of a set of tools. Such 
tools could be: 

• Conceptual Model Tools, e.g. RDBMS, 00 design environment, etc. 

• A Development Environment. A set of tools that supports the corresponding De- 
velopment Method. It provides support for the storage (in the so-called applica- 
tion repository, see Figure 2) and the management of design schemata and ap- 
plication resources. 

• Technology Tools'. For each technology, a set of supportive tools is provided for 
improved exploitation (e.g. XML Toolkits, HTML editors, Java compilers, Vis- 
ual Interdev for ASP, etc.). 

The Development Method and the corresponding Development Environment con- 
stitute a Hypermedia Application Development and Management System (HADMS). 
In previous work [6], we proposed an evaluation framework for HADMS and we 
evaluated some representative systems (OOHDM[7,8], RMM[9], HSDL[10], STRU- 
DEL[11], Vignette’s StoryServer and Microsoft’s FrontPage). The conclusions that 
were derived from this evaluation helped us identify the crucial issues of the devel- 
opment of Hypermedia Applications. 

2.3 Development and Maintenance Procedure 

Consider the steps that developers have to follow in order to implement a Web Appli- 
cation: 

A. Theoretical Resources Decision Phase 

1 . Select one of the application types (we strongly recommend APP4) 

2. Select the Development Method (incorporated Conceptual model)/Environment 

that better meets the application requirements. The developers may use the evalua- 
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tion framework proposed in [6] in order to specify the requirements of the applica- 
tion under development and decide which method and environment to choose. 

3. Select the Technologies that are more appropriate for supporting the requirements 
of the specific application. 

4. Select a Process Model to apply on the Development Method. 

B. Development Phase 

5. Use the Development Method (Conceptual Model) / Process Model and the Design 
Patterns to design the conceptual, navigation and interface Design Schemata. 

6. Use the Development Environment to implement and manage the Design Sche- 
mata. 

7. Produce Application Resources either by transforming data resources or direct 
authoring. 

8. Use the Development Environment to produce the final Web Application, which 
will be either a collection of HTML static pages or an interactive interface to the 
Application Resources or both. 

C. Maintenance Phase 

9. If the developers decide, e.g. for reasons of quality assurance or improved per- 
formance or adaptation to new requirements, to: 

• Replace the Application Type: Go to Stepl. 

• Replace the Development Method (incorporated Conceptual model) / Environ- 
ment: Go to Step2. 

• Replace some of the Technologies: Go to Step3. 

• Replace the Process Model: Go to Step4. 

• Modify the Design Schemata: Go to Step 10. 

• Modify the Application Resources: Go to Stepl2. 

• Produce new Application Resources: Go to Step7. 

10. Use the Development Method (Conceptual Model) / Process Model and the De- 
sign Patterns to modify the conceptual, navigation and interface Design Sche- 
mata. 

11. Use the Development Environment to apply modifications to design schemata 
and propagate them to the Application Resources. 

12. Use the Development Environment to manage the Application Resources. 

13. Go To Step8. 

3 Basic Requirements of Developers 

It is obvious that different applications may have different requirements. Eor instance, 
a part of a Web site may contain only static pages, while another is a dynamic ap- 
plication over a multimedia DB. In such cases the developer may need to use differ- 
ent conceptual models, development methods, design patterns and technologies, that 
possibly do not inter-operate. For almost every application there is a development 
method that can adequately cover its requirements. However, none of the methods 
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can efficiently cover all application domains [6]. Thus, the developer should be able 
to use various conceptual models and development methods/environments for various 
applications. 

The basic problems of the current HADMS are the followings: 

• The development methods incorporate one conceptual model and they cannot 
support others. 

• Most of the methods have no or poor development environment. 

• There is no environment that can support more than one method. 

• The development environments are not easily extensible, in order to effi- 
ciently incorporate upcoming tools. 

• Most of the HADMS do not support efficient importing and exporting either 
for content or design. 

In this work, we stress the lack of a development framework that can address the 
above problems, by covering the following basic requirements. It has to be: 

Rl. Modular. The architecture of the framework should be based on a synthesis of 
abstract components (modules) that can interoperate through their interfaces, in 
order to support the features of the framework. This will facilitate the maintain- 
ability, extensibility and reliability of the framework, as the components can be 
maintained or replaced separately and by different persons. 

R2. Technology-independent. The interfaces between the components of the frame- 
work should be developed with minimal dependency on technologies that may 
soon become obsolete. 

It should support: 

R3. Several Conceptual Models. Some applications are complex and require OO 
principles, while some others are simple and a labeled directed graph (LDG) is 
enough for modeling their concepts. Moreover, different models may be needed 
for designing e.g. the information structure and the interface of an application. 
Thus, the developer should be able to use various conceptual models for design- 
ing the three application parts. 

R4. Several Methods. The developer should be able to use various methods for the 
development of different applications. For methods that provide an environment, 
this (or part of it) could be integrated in the framework (if it can be appropriately 
tailored). 

R5. Data Importing. The framework should provide robust importing mechanisms 
from existing data resources. 

R6. Reusing of Information and Designs: Developers should be able to reuse and 
exchange (through import/export facilities) information and designs across dif- 
ferent applications developed with different models/methods. Moreover, the 
system should incorporate meta-data principles in order to build self-describing, 
machine-readable and human-readable content/application resources, for ex- 
changing resources with other developers. 

R7. Maintenance/management of the applications. Web application services and/or 
information usually change (and in some cases rapidly) throughout the life-cycle 
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of the application. Thus the application must be built so that it can be modified, 
fixed, or maintained with little cost of time or money. The separation of the ap- 
plication’s parts enable applications to be maintained more effectively. Basic 
maintenance facilities include efficient authoring environments for contents, de- 
velopment tools for supporting the implementation details of the targeted system 
(e.g. Web) of the application and finally collaboration facilities among the 
working group’s members (like versioning, access control, etc.) 

R8. Quality Assurance (QA) of content and applications. This involves the behaviour 
of the overall framework and of its individual layers, modules and tools. By 
dealing with the issues of QA in a per-layer level, in addition to using Develop- 
ment Process Models with embedded testing/QA stages, we establish efficient 
workflow paths for developers, avoid problems during the steps of the develop- 
ment process and minimize potential risks for the final output, either content or 
applications. 

4 An RDF/XML Framework for Information and Hypermedia 
Application Management 

In a previous work [6] we focused on defining and analyzing the requirements of 
hypermedia application development. Essentially, the needs of the Web developer can 
be described with abstract modules that are independent of the current development 
methods or models. In this work we propose a generic framework with a modular 
architecture (Rl), based on the synthesis of interoperable modules, in order to satisfy 
the above requirements. Each module defines a class of tools, i.e. the set of tools that 
can contribute to the functionality of the module. Each module can incorporate sev- 
eral tools. Thus, for the implementation of the system, the appropriate tools for each 
module have to be selected, appropriately tailored and integrated into the framework. 
The main advantage is that when a newer or enhanced tool comes up, it can be inte- 
grated into the framework with minimal consumption of effort and time, while the 
developer benefits from the emerging technologies, methods and tools. 

Apart from the architecture, we need to specify the data model that our framework 
will incorporate. The data model must provide interoperability, extensibility and 
reusability so that a range of applications can access the information data. In order to 
achieve this we employ core metadata principles. The main kinds of metadata are 
schematic, navigational and associative, which can be further categorized into de- 
scriptive (such as Dublin Core [12]), restrictive (such as PICS [13]) and supportive 
(such as dictionaries, thesauri and hyperglossaries). Making the repository self- 
describing and based on metadata structures ensures that knowledge about the re- 
pository structure is available to other applications. 

The Resource Description Eramework (RDF) [14] (see Figure 3) is both a founda- 
tion and a mechanism for encapsulating metadata sets and associating them with Web 
resources (along the lines of the Warwick Framework [15]); it provides 
interoperability between applications that exchange machine-readable information on 
the Web. These characteristics led us to adapt the RDF Data Model in our framework 
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as the primary mechanism for describing and interconnecting data, services and mod- 
ules. The RDF data model defines a simple model for describing interrelationships 
among resources in terms of named properties and values. RDF properties may be 
thought of as attributes of resources and in this sense correspond to traditional attrib- 
ute-value pairs. RDF properties also represent relationships between resources. As 
such, the RDF data model can therefore resemble an entity-relationship diagram. 



RDF 

SCHEMATA 
OVER WEB 



XML 

Namespace 

Facility 



URI 

(either a link 
or a scnpt call 




RDF MODEL 



DATA 

RESOURCES 



Fig. 3. Resource Description Framework (RDF) Model 

A Data Resource (DR) in the RDF model is anything that can be referred and/or 
accessed through a Uniform Resource Identifier (URI) [16]. The declaration of the 
properties and their corresponding semantics are defined in the context of RDF as an 
RDF schema. A schema defines not only the properties of the DR (Title, Author, etc.) 
but may also define the types of DR being described. 

RDF utilizes XML (extensible Markup Language) [17] as a common syntax for 
the exchange and processing of metadata. The XML syntax provides vendor inde- 
pendence, user extensibility, validation, human readability, and the ability to repre- 
sent complex structures. By exploiting the features of XML, RDF imposes a structure 
that provides for the unambiguous expression of semantics and, as such, enables the 
consistent encoding, exchange, and machine-processing of standardized metadata. 
RDF also uses the XML namespace [18] facility to precisely associate each property 
with the schema that defines the property. A specific metadata set will be used to 
expose the interfaces of the modules/tools in the framework, thus providing the 
mechanism for their interoperability. 

In summarizing, the design and implementation of the framework is based on the 
widely accepted standards of RDF, XML and URL (R2). Metadata Sets (semantic 
schemata) are specified by RDF Schemata, the syntactic constraints (of metadata and 
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Structured DRs) by XML Schemata and the actual metadata and structured DRs are 
stored in XML. RDF and XML schemata are also valid XML files (according to their 
DTDs), enabling the application of XML content management tools to both RDF and 
XML schema management. XML technologies are constantly evolving and most 
prominent companies rush to support them in their products. This raises another 
strong requirement for developing and maintaining a modular framework, where the 
developer should be able test in practice the emerging technologies and tools, and 
decide which to adopt. 

Our proposal/approach consists of four layers: Data Layer, Information Layer, Ap- 
plication Layer and Implementation Layer (see Figure 4). The separation of layers 
facilitates the applying of various development methods and conceptual models (R4). 



FINAL APPLICATION 




Fig. 4. The Modules of the proposed Framework 



4.1 Data Layer 

This layer provides management facilities of existing data resources (esp. hypermedia 
content). It provides a common way to access heterogeneous resources in a manner 
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that is suitable for web applications. It converts the raw information into self- 
describing content, facilitating the importing or mapping of existing data resources 
into a web application. We define a DATA-UNIT (DU) as a self-describing DR. A DR 
and the sets of Metadata related to it form a DU. The main goals of this layer is to (a) 
provide services for querying and manipulating heterogeneous resources and (b) 
facilitate the exchanging and reuse of resources among developers. 

The already existing resources, produced in the context of previous application de- 
velopment, are application-oriented, and in order to be reusable they have to be trans- 
formed into “neutral” resources ultimately (context-free content and structure). For 
instance, parts of text may have to be rewritten; the presentation has to be separated 
from the content (e.g. the transformation of an HTML file to XML/XSL files); the 
internal structure of the resource and the relationships with other resources has to be 
revised (e.g. when an HTML page is moved from a Web site to another, the link to 
the home page must change) etc. This is neither an easy nor a trivial task. The general 
guideline for facilitating the reusability of the resources is to keep textual information 
as context-free as possible and in structured forms (e.g XML, DB, etc.) and multime- 
dia information (video, audio, images) in high resolutions and large sizes. 

The Data Layer provides the following three modules. Next to the name of each 
module, we indicate the basic requirement(s) that the module meets. 

ML Conversion Module. (R5) A primary service is the transformation of unstruc- 
tured textual information (HTML, PDF, MS DOC, etc.) to (semi) structured 
types (e.g. XML, SGML, DB, etc.). This will provide the ability for querying of 
data, referring to a part of the resource, reusing the whole or part of resource, 
etc. This service will be based on converters or more generic purpose tools, like 
structural and text pattern-matching software. 

M2. Metadata Module. (R5, R6) A set of tools for specification, authoring and man- 
agement of metadata for the DRs. It should support automatic extraction of se- 
lected metadata from the actual DR and strong management features for adapt- 
ability to future metadata standards. Moreover, the information on how to access 
or query structured DRs (even a DB) is exposed using metadata information. Fi- 
nally, the relationships among DRs are specified through special metadata. 
Metadata information is stored in XML and it is based on RDF. 

M3. XML Module. (Rl, R2, R6, R7) This is the core of the framework. Essentially, 
this module is an XML Toolkit. Because of its importance and size we introduce 
the following sub-modules. Each sub-module consists of several tools. The users 
of the framework should select one of these tools according to their needs and 
the features of the tools. 

M3.1. XML-DB Module. This module constitutes of a set of tools that can transform 
data and designs from DB to XML or vice-versa. Different approaches are 
needed according to the complexity of the designs and the type of data (e.g. bi- 
nary data). XML and DTD can directly be mapped to object-oriented and hier- 
arehical databases and can be mapped to relational databases using traditional 
object-relational mapping techniques. Moreover, there are plenty of teehniques 
and tools for transferring a database sehema to a DTD or XML Schema. For 
instance, a promising standard is XMI (XML Metadata Interchange Format) 
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that facilitates the exchange of UML (Unified Modeling Language) models and 
MOF (Meta Objects Facility) meta models among applications. 

M3. 2. XML Parser Module. Important features: validation on DTD or XML Schema, 
support of DOM, SAX, XPATH, XPOINTER, XLINK, XCatalog, etc. 

M3. 3. XML Editor Module. Important features: restrict editing according to DTD or 
XML Schema, validation on DTD or XML Schema, editing of DTD, XML 
Schema, XSL, RDF, etc. 

M3.4. XSLT Processor Module. Important features: Support of XSLTl.O, in- 
put/output in DOM and SAX, multiple output documents, etc. 

M3. 5. XSLT Formatting Objects Module. The main feature is the supporting output 
formats. 

M3. 6 . XML Management Module. Important transformations: XML to DTD, XML to 
XML Schema, DTD to XML Schema, XML Schema to DTD, XMLA to 
XMLB (by example), XML of DTDA to XML of DTDB. Other important 
management operations include: Pattern match/replace, find/manage tree dif- 
ferences among XMLs, etc. Moreover, utilities over textual information could 
be incorporated in this module, like spelling/grammar checkers, thesaurus, 
tools for semantic analysis and automatic extraction of relationships etc. 

M3. 7. XML Query Module. This module will provide querying and navigation 
mechanisms over the Data Units. The query mechanism could be applied on 
the metadata and/or the (semi) structured DRs like DBs and XML documents. 
Moreover, an XML reference mechanism should be supported to facilitate the 
mapping of DUs to application resources. Some features include the support of 
XPATH, XML Query (XQL), XML-QL, etc. 

Obviously, rigorous testing and evaluation (R8) is required for the proper selec- 
tion, tailoring and integration of tools within a module, as well as for the seamless 
interconnection of modules within the layer. This is the responsibility of the sys- 
tem/framework developer, who should construct a robust environment using individ- 
ual parts, while paying special attention to topics such as interoperability, support for 
multiple languages, usability of the system by the content authors. The same principle 
holds for all layers, so that the results of each layer/module/submodule fulfill the 
requirements of the framework. 

Additionally, Data Units should have well-prepared content and metadata, easily 
comprehensible to the end-user. This is the responsibility of the content authors, who 
can be assisted by tools such as statistics generators for missing (required or not) data 
fields, syntax checkers, content complexity analysers. 

4.2 Information Layer 

In this layer the developer analyses the information space of the application domain 
and designs the information structure using the more appropriate conceptual model 
(00, ER or LDG) and the available design patterns. There is no concern for the types 
of users and tasks, only for the information domain semantics (ontologies). After- 
wards, the developer can implement the design through a design tool (RDBMS, 
OODBMS, XML Schema Editor, etc.). The data stored in the instance of the design is 
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either authored by the information provider or extracted from the DUs of the Data 
Layer (through the XML Query Module over the metadata and/or data of the DUs). 
This facilitates the importing/reusing of existing information and the rapid informa- 
tion structure development. The structure and the information itself should be con- 
text-free (as far as possible), in order to increase the reusability of the information. 

The design tools keep the design and data of the model in their native format. In 
order to facilitate the reuse of data and designs developed with different models, the 
solution is to have a common format that can express all the conceptual models in a 
way that supports interoperability, manageability, reusability and machine under- 
standing (R3). This common format is XML Schema. With XML Schema the devel- 
oper can import, for instance, an LDG design in an OO design and then manipulate it 
with OO design capabilities. On the other hand, the developer will be able to convert, 
e.g. an OO design to ER or a LDG, losing of course much of the powerful OO capa- 
bilities but making the designing much simpler. The conceptual design in XML 
Schema and the corresponding data in XML comprise an INFORMATION-UNIT 
(lU), because it can be easily transferred among developers and applications as a unit. 

Regarding the issues of QA (R8), the same principles as in the Data Layer hold. 

Information Layer provides the following modules (that should closely co-operate 
with XML Module - M3): 

M4. Conceptual Model Design Module. (R3, R4, R6, R7) A tool for supporting the 
design and storing of the conceptual model (e.g. OODBMS, RDBMS, XML 
Schema Editor). It should support (a) importing of designs in XMI, XML 
Schema or DTD, (b) the association (importing or mapping) of data elements of 
the design (objects attributes, rows of tables or XML elements) to DUs metadata 
or data through the XML Query Module or XML Parser Module (by query or 
reference), (c) authoring of new content and updating of existing through an 
authoring environment that may be automatically derived from the design of the 
conceptual model, (d) management operations on the design of the information 
space and propagation of these to the data of the model (e.g. merging of two at- 
tributes of an OO class, or elements of an XML schema, etc.) and support of 
management operations on data - both content and structure. Thus, important 
features include: Design importing/exporting from/to XMI, XML Schema and 
DTD, Data importing/exporting from/to XML, evolution of the design and 
propagation to data, efficient authoring environment etc. 

M5. Access Module. (R4) This module provides querying, navigation and reference 
mechanisms over the data of the Information Layer. Important features include: 
a query service on the conceptual models (e.g. SQL for ER) and a reference 
mechanism for the data of the conceptual model. The XML Query Module can 
be used also for accessing the XML format of design and data of Information 
Layer. 

4.3 Application Layer 

In this layer the developer analyses the types of intended users of the application and 
their tasks and designs the navigational structure (application logic of a hypermedia 
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application) over the information space. For the navigational structure, we adopt the 
OOHDM development method, where it is described in terms of a navigational 
schema and a context diagram. 

The schema should support at least the basic concepts of hypermedia systems i.e. 
nodes, links, anchors and access structures (such as indexes, guided tours, etc.). 
Nodes and links represent logical views on conceptual data model defined during 
information layer. By defining the navigational semantics in terms of nodes and links, 
we can model movement in the navigation space independently of the conceptual 
model, while modifications into the conceptual model can have an impact in the navi- 
gational one, which the environment should propagate automatically. 

Once the navigation classes have been decided, it is necessary to structure the 
navigation space that will be made available to the user. In OOHDM this structure is 
defined by grouping navigation objects (i.e. nodes, links, context classes and other 
nested navigational contexts) into sets called contexts. The navigation structure of the 
application is defined in a context diagram, which shows all the access structures and 
contexts defined for this application, and the possible ways of navigation between 
them. 

The navigational schema and the context diagram can be designed by using the 
more appropriate conceptual model. Design patterns describing known navigation 
solutions may be applied. The Application Layer uses the Conceptual Model Design 
Module of the Information Layer. The data stored in the instances of the designs is 
extracted from the Information Layer, through its Access Module. On the other hand, 
the Access Module of the application layer provides querying and reference mecha- 
nisms over the data of this layer. 

The data and designs of this layer, when represented in XML/XML Schema, com- 
prise an APPLICATION-UNIT (AU), because it can be easily transferred among de- 
velopers as a unit and can be re-used in various implementation environments. 

To enhance the quality of application units, unnecessary complexity in the context 
diagrams should be avoided. Metrics such as mean number of steps until the user 
finds the most "obscure" information can be calculated and utilized, e.g. in order to 
determine the granularity of the categories of a thematic catalog, on top of specific 
content 

4.4 Implementation Layer 

In this layer, we address issues of the applications’ user interface (UI) that are obvi- 
ously based on the capabilities of the implementation environment (e.g. WWW- 
HTML pages, WWW-Cocoon Servlet, WWW-ASP, a CD-ROM, etc.). In this layer 
the developer specifies interface aspects and behavior of navigational classes (nodes, 
links, access structures) according to the selected implementation environment. The 
Interface Layer provides the following module: 

M6. Implementation Module. (R7) Tools for supporting the implementation of the 
interface aspects and behavior of the Navigational Classes, using technologies 
provided by the implementation environment (e.g. for the WWW-HTML 
through HTML templates that contain HTML code mixed with references 
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and/or queries to the Navigational Schema and the Context Diagram). For the 
Weh, such tools could he HTML editors, javascript debuggers, Java develop- 
ment environments, etc. 

Especially important is the implementation of unambiguous user interfaces (R8). 
The tasks that the user can perform should be crystal clear and workflow should be 
straightforward. Help should be one click away and the use of wizard-driven user 
interaction offers significant advantages, especially for mainstream visitors. An ap- 
proach that can be followed by implementers is the task-centered user-interface de- 
sign [19], which emphasises analysis of both tasks and users. The same approach can 
be employed by the system/framework developers in the creation of the other layers' 
UI, which in effect are the framework front ends for content and application authors. 

4.5 Collaboration Module 

Additionally, we introduce a general module, which facilitates the collaborative work 
over the resources of the framework. 

M7. Collaboration Module. (R7) A set of collaborative tools that enable people to 
work together during the design, development and maintenance phases. This 
module maintains the relationships between a working group’s members, tasks 
and information and provides services that may include: Multi-user access. Ac- 
cess control. Member activity tracking. Versioning, Notification control etc. 

4.6 Overall Testing and Quality Assurance 

Even though we attempt to provide QA within the layers of the framework, there is 
always the need to evaluate the final outcome, which can be a content category of a 
Web site or an entire application. To assess the quality of WWW site content, one can 
measure criteria such as easy navigation, improved availability and precision of in- 
formation, enhanced guidance of the user throughout the navigation process. We 
summarise several practical approaches which, in addition to traditional testing by 
closed user groups, can highlight, among others, problems in content structure, navi- 
gation, flow and interaction of the application: 

• Use of log analyzers, parsing the paths of users throughout content and applica- 
tions and measuring data such as number of hits, depth of traversal graph, time 
(links) spent within the site, time (links) spent within specific categories, traversal 
patterns. 

• On-line questionnaires for the users, requesting information on the usability of the 
site, attempting to evaluate and measure overall user satisfaction 

• Eeatures (mini-polls) embedded in the content/application pages, such as simple 
"voting" for correctness of the results of an operation (e.g. "See Also" links) and 
for rating the style and overall quality of the content presented (e.g. from 1 to 5) 

Problems detected can be solved on a modular basis, depending on the layer which 
is involved. Additionally, this new knowledge can be embedded within the frame- 
work (e.g. in a design pattern), in order to improve the performance of the design and 
implementation process. 
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5 Implementation 



To implement the proposed framework, the following tasks have been scheduled and 

are in progress: 

• For each module, review of the available technologies and tools, {continuous task) 

• Selection of the most appropriate ones and tailoring to support the interfaces of the 
modules (see Table 1 for tools that have already been selected and tested). 

• Implementation of the glue logic and the Web interface of the framework. The 
framework is running under an Apache Web server (in Windows NT and Solaris 
2.7), through the Cocoon Servlet (Apache XML Project), and will soon provide a 
robust set of XML-related libraries and supportive tools. Perl and Java are used as 
the scripting/programming languages, because of their ease of use, wide-spread 
appeal and the large (and ever increasing) library of available code. 

• Testing of the framework with various methods, models and on different applica- 
tion domains. 



Table 1. Tools already incorporated in the Framework 



Modu- 

les 


XML Apache 
xml.apache.org 


IBM XML 

http ://w w w . alphaworks . ibm.com/ 


OTHER 


Ml 






Rhythmyx XSpLit 
(www.percussion.com) 


M2 






SiRPAC (www.w3c.org) 


M3.1 




XLE, Visual XML Creation 




M3. 2 


Xerces Java 


XML4J3.01 


XP (www.jclark.com) 


M3.3 




Xeena 1.1, Visual DTD 


XML Spy 
(www.xmlspy.com) 


M3. 4 


Xalan Java 


Lotus XSL 


XT (www.jclark.com) 


M3. 5 


FOP Java (for 
PDF) 






M3. 6 




DdbE, XtransGen, Visual XML 
Transformation, PatML, 
XMLTreeDiff, X-IT 




M3. 7 




Visual XML Query 




M4 




XMI Toolkit 




M6 






Allaire HomeSite 
(http://www.allaire.com) 


M7 






RCS, MS SourceSafe 



As application domains we have selected Culture (the primary Web site of the 
Hellenic Ministry of Culture, “ULYSSES” at http://www.culture.gr) and Sports (the 
primary Web site of General Secretariat of Sports in Greece), which both feature: 

• extensive content, ranging from thousands of static HTML pages to integrated DBs 
(e.g movable artifacts of the Acropolis of Athens) 

• various Web applications, including thematic catalogues, user-profile based navi- 
gation, search systems, metadata-based search systems, geographical map-driven 
navigation. 
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Using the framework and the guidelines presented in this work, we are proceeding 
with the restructuring of existing applications and content from both sites, into com- 
ponents that are gradually incorporated within the joint framework. 

Significant progress has been achieved within the last months. Entire content cate- 
gories of the WWW sites have been remodeled (e.g. transformation of more than 
1500 existing HTMLs with associated metainfo into an E-R DB). Common metadata 
sets have been added to content from heterogeneous sources, including HTML-XML 
document collections and DB stores, thus constructing a pool of content (DUs/IUs) 
and enabling transparent access for Application Units development. 

Prototype AUs have also been implemented, offering services such as guided 
tours, thematic catalogs, site-map generators on top of DUs/IUs. The interfaces to the 
AUs (implementation layer) consist of Java applets. Javascript & HTML files, Elash 
applications. The new version of ULYSSES is expected to be online by Fall. 



6 Conclusions 

In this work we presented the current status of Web development and outlined the 
major problems, current or upcoming, that a Web developer faces. In spite of numer- 
ous tools and technologies, several basic requirements of developers are still not be- 
ing met. We summarized these requirements and proposed a more formal framework 
that could support them. We emphasized on the issues of content/application reuse, 
ease of maintenance and interoperability. 

This framework builds on a set of guidelines that a developer should always con- 
sider, and intends to provide a generic development support environment. Its basic 
characteristics are: scalability (modular architecture formed by modules/tools) and 
technology independence (ability to replace tools in order to incorporate forth-coming 
techniques or technologies). The core technologies are XML and RDF. 

The viability of this framework is already being explored through an implementa- 
tion scenario, which is based on the content and application/services of two of the 
largest Web sites in Greece. This process also allows us to examine various technical 
solutions for the diverse modules and services of the framework. This hands-on expe- 
rience and alternate technical implementations of the framework will also be dissemi- 
nated to the Web developers’ community. 
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Development Tools, Skills and Case Studies 



1 Overview 

The processes and methodologies which make a discipline practical need to be 
supported by appropriate tools, and skills on the part of its practitioners. At times, the 
existing tool set proves to be either inadequate or wholly unsuited to the tasks in hand. 
The usefulness of any tool becomes clear only as people develop solutions to the 
perceived problems. The scale and novel applications in the Web arena provide a 
fertile area for new problems and problem solving. The volatile nature of the 
technologies associated with the Web virtually guarantee that new problems will crop 
up constantly and will lead to the fashioning of new tools to solve those problems. 
The papers in this section focus on the tools and skills that were needed in specific 
situations and represent but a small amount of the total effort in this area. 

The first paper. Synthesis of Web Sites from High Level Descriptions, deals with 
the automation of design of Web sites in a novel way. As use of Web sites has 
exploded, large amount of effort has gone into the deployment of sites but little 
thought has been given to methods for their design and maintenance. The paper 
reports some encouraging results on the use of automated synthesis, using domain- 
specific formal representations, to make design more methodical and maintenance 
less time consuming. 

The next paper, Meta-XML Specification Language, tackles the problem of 
maintaining consistency in Web sites. Documents such as legal documents, medical 
records and instruction manuals etc have relationships between individual documents 
and are also authored by many people over time. People are not good at consistently 
following sets of rules. XML can assist Web page authors by ensuring that Web 
pages of the same type have the same kinds of semantic content. The paper presents 
an XML extension that describes relationships between XML documents and shows 
how such an extension helps to capture business rules. 

The third paper, Engineering of Two Web-Enabled Commereial Software Serviees, 
reports on two ease studies where an ingenious applieation of new teehnology led to 
improved solutions to old problems. The paper explains the eompelling reasons for 
developing the two applieations using the Web arehiteeture and the advantages of 
Web-based serviees over traditional methods. It also highlights some of the trials of 
developing, testing, marketing, and maintaining eomplex web-enabled serviees. The 
authors, working in a researeh laboratory whieh licenses its products to users, find 
that the Web provides an easy way to reaeh elients quickly. In traditional standalone 
systems, elients lieense the whole system, install it on their maehines, and use it to 
perform the required operations. Developing, documenting, drumming up support, 
training, and consulting can be a burden for a researeh team since they are not 
neeessarily equipped with the resourees to do this. Web-based services reduce time to 
market, make it easier to add new features and modify the old ones, and none of the 
ehanges requires a massive rework of the design of the system. Prototyping is easier 
to do. Web-based delivery allows large-scale usage trials to be easily faeilitated. 
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Additionally, there can be protection of intellectual property against reverse 
engineering and/or decompiling. A Web based service can also result in a price an 
order of magnitude lower than a traditional system 

The fourth paper, A Skills Hierarchy for Web Information System Development, 
addresses the problem of skills needed, especially for Web-based Information System 
Development A Web Information System (WIS) typically involves many more 
‘players’ or users than a traditional IS, necessitating better understanding of all types 
of users. A WIS also requires different and varied skills from the users since it is 
likely to be used more widely than the existing IS and also since it would interact with 
other, existing IS to collect, collate and disseminate information and functionality. 
The paper proposes a three dimensional skills space to locate any given user, the 
dimensions being management, technical and human interaction skills. The 
management skills include skills to coordinate, regulate and integrate the WIS with 
the organisation and with other IS within the organisation with which it would 
interact. The technical skills include computing, networking and internet 
communication skills. The human interaction skills are associated with graphic 
design, layout, human communications and presentation skills. 

The paper discusses six types of users, ranging from naive end-users to 
experienced ones who may create or be responsible for development and maintenance 
of their own Web sites/applications within the overall structure of the organisation. 
The subsequent analysis of the required skills level is shown to help in determining 
job descriptions, levels of technical expertise for consultants, assessment of the skills 
needed in a WIS development team and for training of the users at all levels. 

The last paper, A Case Study of a Web-Based Timetabling System, is a case study 
of a successful application. There are a number of alternative technologies and 
architectures that build bridges between Web and enterprise databases. A Web 
developer’s choice is influenced by the factors such as the complexity of data, the 
speed of deployment, the expected number of simultaneous users and the frequency of 
database updates. The paper describes the experience and lessons learnt in putting a 
timetabling system on the Web. 
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Abstract. As use of Web sites has exploded, large amount of effort 
have gone into the deployment of sites but little thought has been given 
to methods for their design and maintenance. This paper reports some 
encouraging results on the use of automated synthesis, using domain- 
specific formal representations, to make design more methodical and 
maintenance less time consuming. 

Keywords: Web site application, computational logic, HTML. 



1 Introduction 

Web site maintenance has become a challenging problem due to the increase 
in size and complexity of Web site applications. It often involves access to 
databases, complex cross referencing between information of the site and so- 
phisticated user interaction. 

Web sites applications related to a same domain often share a common pat- 
tern. Consider, for example, the Software Systems and Processes Group at Edin- 
burgh (www.dai.ed.ac.uk/groups/ssp/index.html) the research group Web sites 
of the Artificial Intelligence Research Institute in Barcelona (www.iiia.csic.es) 
and The Artificial Intelligence Laboratory at MIT (www.ai.mit.edu). Although 
they look quite different, the underlying application design is very similar, par- 
ticularly in information content. We would like to exploit these similarities, for 
example in reusing application components for visualisation designs, thus saving 
time in application development. 

One way to achieve this goal is to separate, formally, the information con- 
tent of Web sites from their presentational form. Separation of content from 
visualisation aspects at early design stages is important in order to allow a de- 
scription its essential content, such as data, operations, information flow within 
the site and constraints on the information flow. These features are mingled with 
presentational descriptions if we work directly in HTML [8] code. 

A traditional way of representing information processing abstractly is through 
computational logic. Although logics provide a powerful framework for applica- 
tion description, few people feel comfortable when using them directly as a tool. 

* On leave from University of Amazonas, Brazil. 
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This problem can be overcome by designing domain or task-specific dialects of 
a logic which are adapted to the informal styles of description used in the ap- 
plication, while also supporting automatic translation to less intuitive formal 
representations needed for computation. This leads to different levels of rep- 
resentation in which a mapping from higher levels to the lower ones will be 
required. Hence, the Web site generation process is organised in different levels, 
from a high level description going through an intermediate representation to 
the resulting site code. 

This work describes an approach to design and maintenance of Web site ap- 
plications which is based on simple form of computational logic The key feature 
of the proposed approach is separating the site information content from its 
presentational form and deriving the Web site code from its content descrip- 
tion via automated synthesis. The main benefit of the approach is the ability to 
work separately on the application and visualisation specifications, allowing up- 
dates in the application content without necessarily changing the presentational 
form and also changing the visualisation of the site without any changes in the 
specification of the site content. 

Parts of this task have been addressed by others. WebMaster [7] is a tool, 
that can be used for constraint checking on Web Sites based on rules expressed 
in logic. Strudel [5] is a Web site management system, which generates Web 
site code from data residing in a database via a SQL-like language. Fernandez 
et al [6] addressed the problem of specifying and verifying integrity constraints 
on a Web site, based on a domain specific description language and creating a 
graph structure to represent the site. This is done in a fashion very similar to 
our work, but we also have added representation and automated generation of 
operations (CGI programs), which is not considered by Fernandez et al. 

The next section introduces the 3-level approach, section 3 presents a working 
example, section 4 describes the problem description language, section 5 presents 
the synthesis process for the example Web site. 

2 A Three-Level Approach 

A Web site application is a collection of pages where each page consists of infor- 
mation content and links. Information content often is described from a database. 
Links correspond to transitions between pages and there are two different ways 
of making a transition: via a hyper-link between two pages or by an operation 
call. An operation is a program that may receive some input arguments from 
the first page and displays the result of its computation in the second page. We 
implement operations as CGI programs [4]. From this point onwards we refer 
to hyper-links just as links since we have already made the distinction between 
them and operation calls in the context of transitions. 

Our approach to the design and maintenance of Web sites is based on com- 
putational logic. However, since the designer is not supposed to work directly 
with logic, the synthesiser is arranged in three different levels as illustrated in 
Figure 1. The high level description should be provided by the designer by means 
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of an appropriate interface. Having this initial description as a starting point, an 
intermediate representation for the application is built using a domain-specific 
formal language. The Web site code is automatically generated from the inter- 
mediate representation. The key idea in this approach is separating information 
content from its presentational form. Hence, a description of application com- 
ponents such as data and operations can be produced regardless of its future 
rendering in a Web browser. 




Having made this specification in a declarative way, independent from imple- 
mentation/visualisation details, allows a great deal of flexibility in choosing ap- 
propriate styles for visualisation and different implementations for the opera- 
tions. Our paper concentrates on the left side of the diagram in Figure 1 ex- 
plaining how the intermediate representation is used in code synthesis. A brief 
discussion on visualisation issues is given in section 4.4. 

3 An Example 

As a working example, consider a research group Web site like the ones mentioned 
in the introduction. The site should display information about the group, such 
as an introduction about the group aims and activities, members of the group 
and their publications and projects which have members of the group involved. 
Note that this is a subset of a real research group Web site which may include 
more information than that described here. 

Formal representation of our example site begins with the domain-specific 
language used to describe the basic elements of the research group. This is done 
in conventional predicate logic but it is (equally) easiest to think of this as 
a database of relevant information. For instance, the database of the research 
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group Web site can be described by the following set of predicates, where the 
arguments names give a guide to the type of data structure to which each would 
be instantiated in the actual database. 

group_aims(Aims). 

contactJnfo(E_mail, Address, Postcode, City). 
person(Name, Status). 
project(Title, Abstract, PeopleJnvolved). 
publication(Title, Authors, Reference, Year, Abstract). 

We want to build a Web site structured like the diagram in Figure 2. 




Fig. 2. A general representation for a research group Web site 



Each box corresponds to a page of the site and arrows correspond to transi- 
tions between pages. Our next step is to describe a possible distribution of data 
among the pages of the site. 

The home page should present the aims of the group and contact information. 
In the people page we want to show two different lists, one containing the names 
of the current members and the second with names of previous members of the 
group. The publications page presents a list with all publications of the group 
members. A list of project titles is displayed on the project page and for each 
project there is a specific page presenting its title, abstract and people involved. 
In the following discussion we shall introduce a formal representation of these 
requirements and then use these to generate a Web site. 

The relation between information displayed on a page and the research group 
description is expressed by rules. The expression below defines that the group 
aims and contact information are displayed on the home page. 

display(home, [group_aims(X), contactJnfo(Y)]) <— 
group_aims(X) A 
contactJnfo(Y), 
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Changes in display of information are represented by transition rules. The 
transition from home to the people page, for example, is represented by the 
following expression: 

display(home, [ ]) ^ display(people, [ ]). 

The empty list “[ ]” means that the transition is independent of the infor- 
mation displayed by either page. This sort of expression corresponds to a simple 
hyper-link. 

Similar expressions are used to represent the other pages and transitions. A 
catalogue of these expressions is given in the next section. 

4 Describing Web Site Applications 

We define a Web site application in terms of data, operations, transitions and 
constraints on the transitions. The expressions used for this purpose are formally 
described here. 

4.1 Transitions 

Information flow when navigating a Web site can be viewed as a sequence of 
actions, where each action is the display of a set of information. A transition 
moves from one page to another. As mentioned earlier, there are two different 
ways to make a transition: via hyper-links or operation calls. In order to express 
information flow in terms of actions, an operator and two special predicates were 
defined. The following tables explain their meaning. 



Expression 


Interpretation 


display(ld, InfoList) 


display InfoList at page identified by Id. InfoList 
is a list with the form [pi(Infoi), P2(Info2), 
. . . , Pnilnfon)], where each p^ is a predicate 
and Infoi a variable corresponding to a specific 
piece of information. 


satisfy(p(Argi, Arg2 Arg„)) 


operation p can be executed. Each Arg^ is an 
argument that can be either input or output 
to p. 



Predicates display and satisfy are combined by an additional connective in 
order to express transitions. Some transitions are conditional, where a condition 
is defined by a conjunction of predicates. The table below shows different sorts 
of transition expressions. 
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Expression 


Interpretation 


display(ldi, [ ]) ^ 
display(ld2, [ ]). 


A transition from page Idi to page Id 2 . The empty 
list means that the transition is independent of any 
information displayed in either pages. 


display(ldi, InfoListi) 
display(ld2, [ ]) ^ C. 


A transition from page Idi to page Id2 is associated 
with information in InfoListi. C is a condition which 
includes a set of predicates, that is mainly used to 
retrieve data from the database to instantiate pieces 
of information. 


display(ldi, [pi(A)]) =» 
display(ld2, [p2(B)j) ^ 
satisfy(foo(A,B)) 


A transition from page Idi to page Id2 is done via the 
execution of operation foo given the input argument 
A from page Idi. The result B is displayed in page 
Id2. 



Using the expressions above, the information flow of our Web site can easily 
be described, as illustrated by the example below. 

display(ldl, [pi(lnfoi)]) ^ 

Pi(lnfoi). 

display(ldl, [ ]) display(ld2, [ ]). 

display(ldl, [ ]) display(ld3, [ j). 

display(ld2, |p2(lnfo2)]) display(ld4, [p4(lnfo4)]) ^ 

P2(lnfo2) A satisfy(foo(lnfo2, lnfo4)). 

A graphical view for this example is depicted by Figure 3. 



pi p2 p4 




Fig. 3. Information flow 



4.2 Operations 

A library of parameterisable components is used to build the operations of the 
application. Each component corresponds to a different type of operation, such 
as queries. Alters, etc, and parameters usually include a name for the operation, 
input and output arguments. The construction of operations are based on a 
simplified form of techniques editing [9]. 
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The most common sorts of operation that we have encountered in Web site 
applications are queries on databases. The following table presents an initial set 
of components. These are task specific, so more components would have to be 
added to cover a wider range of tasks. Nevertheless, the small set we have now 
is surprisingly versatile. 



Expression 


Interpretation 


bl_recursion(P, B, Ns, N() 


denotes that predicate P defines a recursion over 
a fact predicate B with the argument position Ns 
being the starting point of the recursion and Nt 
the end point. 


filter(P, Nd, Nc, test(T)) 


denotes that predicate P filters elements of a list 
at argument position N^ by deconstructing that 
list and constructing a list of chosen elements at 
argument position Nc. The test used to filter ele- 
ments is the predicate named T. 


query(Q, P, Argsin, ArgsOut) 


denotes that query predicate Q finds all solutions 
for predicate P given a list of input argument 
positions Argsin and a list of output arguments 
positions ArgsOut. These arguments refer to the 
predicate P which must be a fact. 



These expressions work by instantiating a program implementation pattern 
associated to each sort of component. The parameters serve as an interface to 
the operation generation process. The resulting instantiated operation is a CGI 
program. 

For example, the definition: 
query(pubs_by_year, publication, [4], [1,2, 3, 5]) 

corresponds to a query operation, called pubs_by_year, that performs a query 
on the predicate publication. The query returns values corresponding to the ar- 
guments positions [1,2, 3, 5]. Argument of position 4 is given as input parameter. 

The following code correspond to the CGI program (in Prolog) corresponding 
to this operation: 

main :- 

get_form_input(F), 
get_form_value(F, year, Y), 
pubs_by_year(Y,PubList), 
show_page(PubList). 

pubs_by_year(Y, PubList) :- 

findall([T, A, Ref, Abs], publication(T,A,Ref,Y, Abs), PubList). 

Predicates getJormJnput, get_form_value and output_html are given by the 
Pillow library [2] which provides facilities for generating HTML code for logic 
programming systems, including form handlers and Web document parsing. 
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The predicates above perform the following tasks: 

— get_form_input(F): translates input from the form to a list of attributes=value 
pairs. 

— get_form_value(F, Attribute, Value): gets value Value for a given attribute 
Attribute in list F. 

— output_html(T): T is a list of HTML terms that are transformed into HTML 
code and sent to the standard output. HTML terms have a direct correspon- 
dence to Pillow terms. 

The first two predicates are used to process the form and get the parameter 
to call the query operation pubs_by_year. The result (PubList) is given to a specific 
predicate show_page to generate the HTML code corresponding to the resulting 
page. The details of this transformation are discussed in section 4.4. A similar 
pattern is followed to build filtering and recursing operations. 

These descriptions are versatile because they support modifications to the 
implementation (via the program pattern) as long as the interface of the com- 
ponent remains the same. 

4.3 Constraints 

The constraints considered in our approach are used to enforce an order of infor- 
mation presentation. A very common constraint of this sort appears in electronic 
commerce Web sites, where information about the purchase and the total amount 
must be displayed before the customer provides the payment information. Simi- 
larly, a confirmation of payment must be displayed after checkout. 

In order to specify constraints in the order of information display, we use 
two concepts from Transaction Logic (TR) [1], serial conjunction and path, that 
were adapted to represent the sort of constraints we need. Serial conjunction is 
used to represent a sequence of actions. This is written in the form a ® b to 
define a path formed of action a followed by action b. 

In the Web site context a path is simply a sequence of information display. 
Hence constraints on a Web site can be expressed in terms of valid/invalid paths. 
Paths can be derived from the site graph, where nodes correspond to pages and 
edges correspond to transitions. The site graph is easily built by inspecting the 
transition expressions. We assume that a finite number of acyclic paths can be 
extracted from the transitions definitions. Figure 4 shows the graph extracted 
from the transition specification example given on section 4.1. 



pi 




p2 

p3 



p4 



Fig. 4. Site graph 
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The simplest path contains a single element which is a display goal, as defined 
earlier. Hence path expressions are of the form: 

display(ldi, InfoListi) ® display(ld2, lnfoList2) ® • < 8 ) display(ld„, lnfoList„) 

Another useful concept taken from TR is a special symbol path which corre- 
sponds to a sequence of actions of any length. This concept allows us to write 
simplified expressions. For example, the expression: 

path 0 display(ldi, InfoListi) ® display(ld2, lnfoList2) 0 path 



denotes any path that displays the page identified by Idi which is immediately 
followed by page Id2. 

The following table presents some common constraint expressions: 



Expression 


Interpretation 


^ (path 0 ^ display(p, InfOp) 0 path) 
0 display(q, Info^) 0 path) 


Information InfOp must be displayed be- 
fore information InfOg. 


^ (path 0 ^ display(p, InfOp) 0 path) 
0 display(q. Info,) 0 path) 


Information InfOp must be displayed im- 
mediately before information InfOg. 


-■ (path 0 display(p, InfOp) 0 path) 
0 ^ display (q. Info,) 0 path) 


information InfOg must be displayed af- 
ter information InfOp. 


^ (path 0 display(p, InfOp) 0 
^ display(q, Info^) 0 path) 


information InfOg should be displayed 
immediately after information InfOp. 



Constraint checking can be done by matching paths expressions with con- 
straint expressions. Note that the special predicate path matches paths of any 
length. For example, consider the following paths: 

Pi: display(ldi, Infoi) 0 display(ld2, lnfo2) 0 display(ld4, lnfo4). 

P2: display(ldi, Infoi) 0 display(ld3, Infoa). 

Now, consider the two following constraints: 

Ci: ^ (path 0 ^ display(ld2, lnfo2) 0 display(ld4, lnfo4) 0 path). 

“Information lnfo2 should be displayed before information lnfo4”. 

C2: ^ (path 0 ^ display(ld3, Infos) 0 display(ldi, Infoi) 0 path). 

“Information Infos should be displayed before information Infoi”. 

From the specifications above it is possible to conclude that constraint Ci is 
satisfied. Note that the negation of the constraint means that paths which have 
that pattern are not valid. As paths pi and P2 cannot match the pattern, they 
are valid. 

On the other hand, constraint C2 is not satisfied because both pi and P2 
match the constraint pattern. Informally, it can be verified that in either path 
information q is displayed without is being displayed before it. 
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4.4 Visualisation Issues 

Visualisation issues are not the main concern in this work, but we summarise 
here the link to visualisation. Recent technologies for the Web such as XML [10] 
and style sheets [12] reinforce the idea of separation between Web site content 
from its presentational form. In this view, visualisation specification is done by 
style sheets, which describe how the information is presented. There are many 
activities on defining languages and standards for style sheets, such as CSS [3] 
and XSL [11]. 

Our current implementation translates the logic descriptions into HTML via 
Pillow. Since we have a separate description for the visualisation, this description 
acts as a style sheet. 

We also have a CSS style sheet that defines some presentational attributes, 
like font type, font size, text color, background color, etc. These attributes are 
also defined by the site designer and are also part of the visualisation description 
as simple predicates, that are transformed into the CSS style sheet. 

Type information (which appears as part of information content description) 
is used to associate each piece of information with a particular style of visuali- 
sation. For example, from the expression group_aims(X), we can define a style to 
present X, which can be a bullet list, a table or plain text. 

Styles are expressed using definite clause grammar rules that transform a 
piece of information in a sequence of Pillow terms that are used to produce 
HTML code. These expressions have the general form: 

style(p(Ai A„)) -> [Ti T^]. 

where is a specific argument value of predicate p and each T^ is a Pillow 
term corresponding to the visualisation of A^. Some examples of the application 
of these expressions are presented in the next section. The architecture of the 
system allows replacement of the target languages used to generate the Web 
site code. Currently we are using HTML and CSS, but XML and XSL could 
also be used. Changes in the target language do not have any impact on the 
intermediate representation. 



5 Generating Web Site Code 

The intermediate representation combined with the visualisation description pro- 
vide all the necessary information to produce the site code. Three main steps 
are followed to produce the site code: (1) check constraints; (2) given the in- 
termediate representation, generate pages structure including content, links and 
operation calls and (3) given visualisation descriptions in style sheets map each 
page structure into HTML/CSS code. 

Here we have the complete transitions specification for the research group 
Web site, depicted by Figure 2. The following discussion on code synthesis refers 
to this description. 
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1 display(home, [group_aims(X), contactJnfo(Y)]) ^ 

group_aims(X) A 
contactJnfo(Y), 

2 display(home, [ ]) ^ display(people, [ ]). 

3 display(home, [ ]) display(publications,[ ]). 

4 display(home, [ ]) ^ display(projects,[ ]). 

5 display(people, [current_members(NamesCurr), previous_member(NamesPrev)]) 

NamesCurr = {Ni:person(Ni,current_member)} A 
NamesPrev = {N 2 :person(N 2 ,previous_member)}. 

6 display(people, [ ]) => display(home, [ ]). 

7 display(people, [ ]) => display(publications,[ ]). 

8 display(people, [ ]) => display(projects,[ ]). 

9 display(publications,[pubs_list(Pubs)]) ^ 

Pubs = {[T,AA,R,Year,A]:publication(T,AA,R,Year,A)} 

10 display(publications, [ ]) =i> display(home,[ ]). 

11 display(publications, [ ]) =A display(people, [ ]). 

12 display(publications, [ ]) =i> display(projects,[ ]). 

13 display(projects,[project_titles(AIIT)]) ^ 

ANT = {T:project(T, Abstract, People)}. 

14 display(projects, [ ]) display(home,[ ]). 

15 display(projects, [ ]) display(people, [ ]). 

16 display(projects, [ ]) display(publications,[ ]). 

17 display(projects,[project_titles(AIIT)]) 

18 display(T,[proj_details(T,A,PI)]) ^ 

T G ANT A 
project(T,A,PI). 

Constraints are checked as described in section 4.3. If the site description 
conforms with the constraints, the second step is to build a list where each 
element describes a structure for each page of the site. 

The page structure can be in two different forms depending whether the page 
is a static one or generated by an operation (via a CGI program) . For static pages 
the structure is: 

page(ld, Contentinfo, Links, OperationCalls) 

where Id and Contentinfo are the same as defined earlier, but Contentinfo is 
fully instantiated. Links is a list with all page Ids that the current page is linked 
to via hyper-links and OperationCalls is a list containing operation names and 
their corresponding input arguments. 

For pages resulting from operations, the structure is the following: 

program(Pld, OperationSpec, Contentinfo, Links, OperationCalls) 

where PId is the identifier of the page, OperationSpec is composed of the 
name of the operation and its input/output arguments. The remaining features 
are the same as in static page structure. 
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As an example of page synthesis, we show the complete synthesis process 
for the people page of the Software Systems and Processes Group Web site 
at http://www.dai.ed.ac.uk/~joaoc/ssp/ people.html. This is done by using a 
generic page definition which is instantiated by the specific transition rules for 
the example via path constraints. Our generic page definition is as follows: 

page(Pageld, Contentinfo, Links, OperationCalls) ^ 

Contentinfo = {I | visible(Pageld,l)} 

Links = {L I link(Pageld, L)} 

OperationCalls = {Op | op_call(Pageld, Op)} 

Visible items on the page correspond to any item being displayed on a path. 
Formally: 

visible(Pageld, I) ^ path (g) display(Pageld, S) (g path A I € S 

Links from a page are those page identifiers which may immediately follow 
the page on any path. Formally: 

link(Pageld, L) <— path (g display(Pageld,_) (g display(L,_) (g path 

The page structure generated for the example using above definitions is: 

page(people, [current_members([’Chris', 'Dave', 'Daniela', 'Jessica', 'Joao', 'Vir- 
ginia', 'Stefan', 'Yannis']), previous_members(['Steve', 'Renaud', 'Alberto'])], [home, 
projects, publications], [ ]). 

The lists with previous and current members’ names are instantiated using 
rule 5 and having a database containing data for person(Name, Status). Links 
are defined in transition rules 6, 7 and 8 and there is no operation call for this 
page. 

Finally, the style expressions used to map the people page to HTML code 
defines the how the information is to be rendered. A general style for all pages 
of the site is defeined including the group logo image, colors, etc. For example, 
the styles applied to the member lists are: 

style(current_members(X)) -> [h2('Current Members'), itemize(X)]. 

style(previous_members(X)) -> [h2(’Previous Members'), itemize(X)]. 

We have also defined a specific style for clustered links, called build_navigator, 
which given a list of links build a HTML table. The resulting visualisation for 
this style is shown in Figure 5, which also present the complete synthesis result 
for the people page. The boxes with rounded corners correspond to the page 
specifications that come from the page structure and the other boxes show the 
styles that are applied to them. 

All the other pages of the site are generated in a similar fashion. The current 
synthesiser is implemented in Sicstus Prolog 3.5 and the complete example re- 
search group Web site can be visited at http://www.dai.ed.ac.uk/~joaoc/ssp/ho- 
me.html. 
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6 Conclusions 

A main contribution of this work is to describe a design approach that joins 
different levels of description to produce Web sites consistent with a correspond- 
ing high level description. Although this approach is domain-specific, we observe 
that a large number of current Web sites have similar domain features. 




Fig. 5. Synthesis result: People page 



For these, our approach supports different visualisation descriptions for the 
same site specification. These visualisations might be tailored to different classes 
of users or to other needs. Changes in the site description without changing 
the visualisation specification are also supported. This changes the role of site 
maintenance from HTML hacking to alteration of a much simpler domain-specific 
problem description, with the site being automatically regenerated from this. 

The actual Software System and Processes Group Web Site (which has been 
in routine use for the past 3 years) is automatically generated from specifica- 
tions similar to those presented here. The cost of developing the synthesiser for 
the group site was justified after only a few weeks by the savings in mainte- 
nance effort [9]. The site can be visited at http://www.dai.ed.ac.uk/groups/ssp 
/index. html. 
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Abstract. The task of keeping any form of consistency in websites is 
difficult. Websites exist for long periods. Over this period of time, 
many people author webpages for a given website. People are not good 
at consistently following sets of rules. XML can assist webpage 
authors by ensuring that webpages of the same type have the same 
kinds of semantic content. 

No system exists to help maintain complex relationships (cross- 
references) between webpages. This paper presents an XML extension 
that describes relationships between XML documents and shows how 
this extension can be used to capture business rules. 



1 Introduction 

A DTD (document type definition) is the part of an XML document that describes 
what semantic content will be present in an XML document. The DTD sections of 
XML documents can easily be shared between different documents. Standard DTDs 
are being developed for specific document types such CML (Chemical Markup 
Language) and MathML (Mathematical Markup Language). [6] 

Meta-XML is an XML extension that specifies relationships between documents. 
Document sets such as legal documents about a case, medical records of an 
individual, and instruction manuals for the maintenance of equipment are authored by 
multiple people over time and have relationships between individual documents 
within the set that must be maintained. Losing these relationships can have financial 
and/or life-threatening consequences. 

Meta-XML specifications are XML documents. The Meta-XML specification uses 
two XML extensions: XLL (XLinks and XPointers) and the where clause from an 
XML query language. Meta-XML assumes that XLL is being used to specify links 
between XML documents. XPointers are used to specify what section of an XML 
document is supposed to either contain or not contain a specific link. The where 
clause from an XML query language is used to specify which XML documents are of 
interest. 

A webmaster is normally the person who is tasked to maintain relationships 
between WebPages. As a website contains more and more documents, this task can 

S. Muragesan and Y. Deshpande (Eds.): WebEngineering 2000, LNCS 2016, pp. 204-212, 2001. 
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become overwhelming or even be ignored by an organization. A system that enforces 
Meta-XML specifications will support the person who is responsible for maintaining 
these relationships. 

The paper shows how a Meta-XML specification can be used to express business 
rules on the relationship between documents. It presents a business scenario that 
contains rules on the relationship between information. Then, it describes a Meta- 
XML specification that expresses these business rules. Lastly, it briefly discusses 
how Meta-XML specifications can automatically be enforced. 



2 Example 

Distributing instruction manuals on how to maintain medical equipment can easily be 
performed over the Web. Errors in these manuals can be responsible for injuring or 
even killing patients. If the repair technician cannot understand a specific instruction 
within the manual, the technician needs to be able to contact a responsible individual. 
Not having responsible individuals available can put the manufacturer of the medical 
equipment at legal risk. Therefore, keeping up-to-date contact information available 
on a website being used to distribute the maintenance manuals is of financial 
importance to the company that makes the medical equipment. 



2.1 Business Rules 

The manufacturer of the medical equipment requires the following relationships 
between instruction pages, individuals and departments. 

1 . Each instruction page has an employee listed as a contact. 

2. Each instruction page has a department listed as a contact. 

3. The contact employee who is not a senior architect, must be a member of the 
contact department. 

4. All contact people must be employed by the manufacturer for at least three 
months or be senior architects. 

5. A person on probation cannot be a contact person. 

If these business rules are maintained, a technician can contact an individual or the 
department for clarification on an instruction page. 



2.2 The Problem 

The number of instruction pages for a specific manufacturer can become very large. 
These pages will need to be changed as new products are added, products are 
removed, people join and leave the firm, and the information about how to maintain 
the medical equipment changes. Multiple people over long periods will be updating 
these documents. Without software support, these business rules will have to be 
enforced by each person who makes a change. It is unlikely that everyone who makes 
the change will follow the business rules every time. When the rules are not 
followed, contact information may be lost making the manufacturer legally vulnerable 
because the manufacturer did not provide appropriate contact information for 
maintaining the equipment. 
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3 Solution 

These business rules that define relationships between documents can be encoded in 
Meta-XML. This will provide a formal specification of the rule that can be 
automatically enforced. How to build a Webserver that can perform this automatically 
enforce Meta-XML specifications is discussed in section 3.3. 

The server that can enforce Meta-XML specifications will also need the following 
to be able to enforce the Meta-XML specification: 

1 . The Meta-XML specification itself, 

2. The instruction pages, 

3. The home pages of the contact individuals, 

4. The department pages, 

5. The list of people on probation page and 

6. The list of senior architects 

3.1 XML Documents 

For this example, five XML DTD need to be constructed: an instruction page, an 
employee page, a department page, a senior architect page, and a probation page. The 
instruction page contains a contact element. The employee page contains a hire date 
element with a start date attribute. 



3.2 The Meta-XML 

The Meta-XML specification is a XML document that follows the Meta-XML DTD, 
which is given in an appendix at the end of this paper. This example has been 
compiled against this DTD. 

<?xml version= " 1 . 0 " ? > 

<!DOCTYPE metaxml SYSTEM "metaxml . dtd" > 
<metaxml> 

The nodeDefs describes and names the XML documents and which part of the 
documents is of interest in the patterns of links. 

Only the contacts element of the instruction pages are of interest. The XPointer 
referring to the contacts element indicates this. Normally, in a nodeDefs an XML 
document is tested against the typeTest's conditions (DTD and whereClause and 
existence of the section of the XML document referred to by the XPointer) in 
sequence and then the section specified by the XPointer is selected. This is 
insufficient to extract all the contacts elements out of an instruction page. By 
declaring the nodeDefs to be recursive, this specifies finding every possible instance 
of all the typeTests against the XML document. In addition, an absent whereClause 
means that there is no testing for the whereClause. 

<nodeDefs recursive="true"> 

<typeTest 

name= " InstructionContacts " 

DTD= " InstructionPage " 

XPointer= " root (). child (* , contacts)" /> 
</nodeDef s> 
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Only employees who have been with the company for more than three months who 
are not senior architects are allowed to be contacts. The where clause will exclude all 
employees who have been with the company for less than 3 months. The language of 
the where clause in the example is arbitrary. When a standard for XML query 
languages has been accepted, Meta-XML should use the where clauses from that 
standard. [4], [5] Also, the absence of an XPointer attribute means that the whole 
document is the node. 

<nodeDef s> 

<typeTest 

name= "NewEmployee " 

DTD= " EmployeePage " 

WhereClause= " root (). hired . date - 3months 
. gt . CURRENTDATE " / > 

</nodeDef s> 

The last four node types are just XML documents that follow given DTDs. 

<nodeDef s> 

<typeTest 
name= " Probation" 

DTD= " Probat ionPage " / > 

<typeTest 
name= "Department " 

DTD= "DepartmentPage " / > 

<typeTest 

name= " SeniorArchitect " 

DTD= " SeniorArchitectList " / > 

<typeTest 
name= " Employee " 

DTD= " EmployeePage " / > 

</nodeDef s> 

The relationship section defines which patterns of relationships are excluded and 
which are required to be in larger patterns. The relationships described below express 
the business requirements given above. 

Business rule 4 states that a pattern of links from the set of documents should be 
excluded. Specifically, if an employee's page is referred to by the probation page as a 
member of the people on probation, the link from an instruction page to that 
employee's page as a contact is not allowed. This is modeled in Meta-XML as an 
excluded relationship. 

<relationship description= "Only employees who 
are not on probation are allowed to be 
contacts . " > 

An excluded relationship is a relationship between XML pages that is not wanted. 
The business role "Employees on probation are not allowed to be contacts for 
instruction pages" is defined below. 

<excluded> 

The simple link definition describes an XLink within the Probation page of member 
type that points to an employee. The labels are used to mark two nodes to be the 
same between different link specifications. Since XLL supports both simple and 
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multiparted (extended) links, the link specification starts with the word simple or 
extended. 



<link type= "Contact " > 

<simple> 

<source type=" InstructionContects" 
label= " 1 " / > 

<target type=" Employee" label="2"/> 
</simple> 

</link> 

<link type= "member " > 

<simple> 

<source type=" Probation" label="3"/> 
<target type=" Employee" label="2"/> 
</simple> 

</link> 

< / excluded> 

</relationship> 

This link specification describes an XLL of type contact from the contact section 
of an instruction page to an established employee. The labels are scoped to only have 
meaning within the context of one pattern specification. 

The business rules 1 through 3 state that a document cannot be published unless the 
related documents are present. For an instruction page to be present, it is required that 
it exists in a larger pattern of links. This is stated by the monitored element of a 
required pattern. 

<relationship description= " An instruction page 
must have appropriate contacts. "> 

<required> 

<monitored description= "This document 
requires appropriate contacts. "> 

<node 

type= " InstructionContacts " label= " 1 " / > 
</monitored> 

One of two patterns is required for an instruction page to be present. Each satisfying 
element describes one of the two patterns. The first pattern is that instruction page 
must be linked to a department page and a employee page who is a senior architect. 
Note that the source is the same node as the node in the monitored pattern. 

<satisfying description= " A senior architect 

and a department are appropriate contacts. "> 
<link type= " contact " > 

<simple> 

<source 

type= " InstructionContects " label= " 1 " /> 
<target type=" Employee" label="2"/> 
</simple> 

</link> 

<link type= " contact " > 

<simple> 

<source type=" InstructionContects" 
label = " 1 " /> 
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<target type=" Department" label="3"/> 
</simple> 

</link> 

<link type= "member " > 

<simple> 

<source type= "SeniorArchitectList" 
label = " 4 " /> 

<target type=" Employee" label="2"/> 
</simple> 

</link> 

</satisfying> 

Business role 4 allows an instruction page to be published requires that the 
employee be with the company for at least 3 months and that the employee is a 
member of the contact department. Meta-XML expresses the requirement that the 
employee be with the company for at least 3 months using a where clause that 
excludes all employee pages of employees who have been with the company less than 
3 month. Only the remaining employees meet the definition of an 
EstablishedEmployee node. 

<satisfying description= "An employee who 
has been with the company for 3 months and 
a department are appropriate contacts if 
also the employee is member of the 
department " > 

<link type= " contact " > 

<simple> 

<source type= " InstructionContects " 
label= " 1 " / > 

<target type= "EstablishedEmployee" 
label="2'7> 

</simple> 

</link> 

<link type= " contact " > 

<simple> 

<source type=" InstructionContects" 
label= " 1 " / > 

<target type=" Department" label="3"/> 
</simple> 

</link> 

<link type= "member " > 

<simple> 

<source type=" Department" label="3"/> 
<target type= "EstablishedEmployee" 
label="2'7> 

</simple> 

</link> 

</satisfying> 

</ required> 

< / relationship> 

< /metaxml > 
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3.3 Comments about Implementation 

The value of Meta-XML increases greatly if it can be automatically enforced. In 
previous work, the authors have described algorithms [1] and a software architecture 
[2] that can enforce Meta-XML specifications. It is critical that all XML that all 
modifications to that set of XML documents constrained by the Meta-XML 
specification be performed through a software system that enforces the Meta-XML 
specification. 

This software system is a form of a constrained database. This specification 
assumes that the system is able to find all links both pointing from and to a specific 
document. If this is true, the pattern definitions are a form of object-centered 
constraints. [3] Unlike general constraints, object-centered constraints do not require a 
search through the whole database when a constrained object (document) is changed, 
but only require tracing the links specified within the constrained objects. This 
ensures that the transaction time to change an XML document that is referred to by 
some pattern in a Meta-XML document will not increase as the size of the XML 
document set increases in size. 



4 Conclusion 

Meta-XML can express business requirements about relationships between XML 
documents. It does this by specifying which parts of specific XML documents are of 
interest and what patterns of links are excluded and required in the document. This 
specification is reach enough to express complex relationships between XML 
documents. A web server that enforces a Meta-XML specification can help 
webmasters maintain patterns of links within there sites. 



5 Future Work 

No software system has been built to enforce Meta-XML specifications. The authors 
plan to build one. This will allow a platform to refine the specification and test its 
usefulness. 

A significant issue in the usefulness such a server is how well authors of pages can 
understand and fix patterns of links when their changes are rejected by the system. A 
key to this is the quality of feedback that occurs when an error occurs. The Meta- 
XML specification contains a large amount of information about the patterns that 
have been rejected and what patterns are needed for a change to be accepted by the 
system. This information should be returned to the authors of the XML documents. 

Meta-XML is a graph language. It may be easier to graphically draw Meta-XML 
specifications than to write them. The authors also plan to create a graphical interface 
for writing Meta-XML specifications. 
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6 Appendix: Meta-XML DTD 

The following DTD is the Meta-XML specification. It can express relationships more 
complicated than the ones above. It can express relationships where a node contains 
or is contained by another node. It also can express relationships containing links 
with multiple targets (extended). It is included so that people who wish to continue 
work in this area have access to this specification. 



<!-- This 
< ! ELEMENT 
< ! ELEMENT 
< lATTLIST 
< ! ELEMENT 
< lATTLIST 
< lATTLIST 
< lATTLIST 
< lATTLIST 
< I ELEMENT 
< lATTLIST 
< I ELEMENT 
< I ELEMENT 
< lATTLIST 
< lATTLIST 
< I ELEMENT 
< lATTLIST 
< I ELEMENT 
< I ELEMENT 
< lATTLIST 
< lATTLIST 
< lATTLIST 
< lATTLIST 



DTD is the definition of Meta-XML --> 
metaxml (nodeDef s-i- , relationship-i-) > 
nodeTypes (nodeDef s-i-) > 

nodeDefs recursive (true | false) #IMPLIED> 
typeTest EMPTY> 

typeTest name CDATA #REQUIRED> 
typeTest DTD CDATA #IMPLIED> 
typeTest whereClause CDATA #IMPLIED> 
typeTest XPointer CDATA #IMPLIED> 
relationship (excluded | required) > 
relationship description CDATA #REQUIRED> 
excluded (node | link-i-) > 
node EMPTY> 

node type CDATA #REQUIRED> 
node label CDATA #REQUIRED> 
link (simple | extended) > 
link type CDATA #IMPLIED> 
simple (source, target) > 
source EMPTY> 

source type CDATA #REQUIRED> 
source label CDATA #REQUIRED> 
source contains CDATA #IMPLIED> 
source containedBy CDATA #IMPLIED> 
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<! ELEMENT target EMPTY> 

<!ATTLIST target type CDATA #REQUIRED> 

<!ATTLIST target label CDATA #REQUIRED> 

<!ATTLIST target role CDATA #IMPLIED> 

<!ATTLIST target roleRequired (true | false) #IMPLIED> 
<!ATTLIST target contains CDATA #IMPLIED> 

<!ATTLIST target containedBy CDATA #IMPLIED> 

<!ELEMENT required (monitored, satisfying+) > 

<!ELEMENT monitored (node | link+) > 

<!ATTLIST monitored description CDATA #REQUIRED> 
<!ELEMENT satisfying (link+)> 

<!ATTLIST satisfying description CDATA #REQUIRED> 
<!ELEMENT extended (source, target+) > 

<!ATTLIST extended complete (true | false) #REQUIRED> 
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Abstract. Traditionally software is loaded onto, or downloaded to, a 
user’s PC where it is then executed. The ubiquitous Web, however, 
allows another choice, a Web front end for data collection residing on 
the user’s PC, a Web service which runs on a remote server, and Web 
delivery of service output back to the user’s PC. We are used to seeing 
this latter architecture used for information retrieval and simple e- 
commerce transactions, but the same architecture can also be used for 
more complicated services. We present two case studies where we 
developed complex Web-enabled services that are currently being used 
by clients over the public Internet. From a software-engineering 
standpoint, we explain the compelling reasons for developing these 
applications using this architecture and the advantages we experienced 
over traditional methods. We also highlight some of the trials of 
developing, testing, marketing, and maintaining such complex Web- 
enabled services. 

Keywords: Web-based service, internet services, advantages of Web- 
enabled services, case study. 



1 Introduction 

At Telcordia Technologic^ we do research related to information services and 
analysis. This research often leads to the creation of systems and/or services that 
implement our research ideas. Over the past several years, we have implemented 
these systems using the conventional method of developing prototypes using 
traditional languages like C and C-H- and implementing the user interfaces using 
Tcl/Tk or X Windows’'*^. The prototypes are used for internal presentations, where the 
system, and the research, is given critical evaluation. If the system is deemed worthy 
of being a commercial product, the prototype is transferred to a Telcordia Software 
Systems Business Unit to be developed as a traditional software system using 
Telcordia’s Quality Method of Operation (QMO), a software development and quality 



' Formerly Bellcore. 
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control process. We have had limited success in developing and marketing our 
current research using this approach. 

Thus, as the Web, and Web-based development software matured, we noticed 
some compelling advantages in developing Web-based services instead of standalone 
systems^ This has led to the development of several Web-based tools and services 
during the last two years. This model has had an almost immediate impact as we now 
are able to very efficiently and rapidly deliver forward looking research technology 
directly to trial users. 

Two of the several research endeavors that resulted in a Web-based service were a 
test case generation service (AETG™ Web) and Year 2000 test generation software 
(Year2000TGF'''M). In this paper, we present our experiences in developing, testing, 
marketing, and maintaining these two Web-enabled services. The AETG™ Web 
service was developed earlier as a standalone system, so for this case study, we are in 
a position to make comparison statements between Web engineering and traditional 
software engineering approaches. The Year2000TGF^M software was designed as a 
Web-based service from the beginning. 

The remainder of this paper is organized as follows. Section 2 describes the 
distinction between a Web service and a traditional software system. In this section, 
we also highlight the issues and considerations for deploying our research ideas using 
Web-based services instead of using more traditional software tools. Sections 3 and 4 
provide more details on the requirements of the AETG™ Web Service and the 
Year2000TGF’''^ software respectively. In Section 5, we present the advantages of 
Web-enabled services. In Section 6, we discuss the issues and the workarounds 
needed for Web-enabled services. 



2 Web-Enabled Service vs. Traditional Standalone Systems 

A Web-based or Web-enabled service is a service deployed over an Internet/Intranet 
where the client of the service accesses the service using a conventional Web 
browser. 

In a Web-enabled service, the software application does not need to reside on the 
client’s computer. In our case, the core of the application and all the intellectual 
property within the system resided on a special segment of Telcordia’s intranet. This 
segment is connected to the public Internet via a firewall. Users access the system as a 
service sent to their computer by submitting their problem requirements at their local 
computer and obtain their solutions as output sent to their computer. The billing can 
be based on the amount of usage, e.g., number of transactions executed, hours of 
usage, or on a fixed price yearly basis. 

In traditional standalone systems, clients license the whole system, install it on 
their machines, and use it to perform the required operations. Beyond the price of the 
software, transaction or usage-based billing is usually not applicable. 

As Internet access becomes more readily available, Web-based services provide an 
easy way to reach clients quickly. Clients are not required to download or install any 
custom software on their machines and they are guaranteed to have access to the latest 



^ We will discuss the issue of Web-based service vs. standalone systems in detail in Section 2. 
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version of the system every time they use it. A good research idea can take a long 
time to mature into a tool that can be widely used and in order to make a business 
case for a new research idea, there has to be a critical mass of users supporting the 
technology. Typically this is done by developing a prototype tool embodying the new 
research idea and then putting in great effort to get a critical user mass who use and 
like the tool. Developing, documenting, drumming up support, training, and 
consulting can be a burden for a research team since they are not necessarily equipped 
with the resources to do this. All these issues and considerations provided us with 
streamlined guidelines to move in the direction of developing Web-enabled software 
services. 

The reach of Internet and Web-enabled services help solve many of these 
problems. One big impact of Web engineering has been the reduced time to market. It 
has been our experience that Web-enabled solutions are easier to develop, especially 
using an iterative approach. New features are easier to build, old features are easier to 
modify, and none of the changes require a massive rework of the design of the 
system. Prototyping is easier to do. Web delivery allows large-scale usage trials to be 
easily facilitated. More details on the advantages of a Web-enabled service will be 
presented in Section 5. 

Another compelling reason for using Web-enabled services is the protection of 
intellectual property. In our case this was a special concern. We had developed highly 
sophisticated test data generation algorithms that explore an exponential search space 
using constraints in real time. Though protected by patents, we had a fear that the 
software might be reverse engineered and/or decompiled. Also, for software 
embodying serious intellectual property, reverse engineering becomes a special 
concern when licenses expire and clients are expected to ‘purge’ their system of the 
existing copies of the software. Bureaucratic overhead related to tracking expired 
licenses can be very high and resource consuming. 

The economics involved in delivering Web-based solutions are also drastically 
different. While a detailed economic study and evidence is beyond the scope of this 
paper, we must point out that a traditional system can expect to be priced an order of 

magnitude higher than a Web based service^ Though we do not have studies to 
present, currently there seems to be psychological perception that Web-users should 
pay considerably less for a Web-based product then they would pay for a traditional 
standalone product. This implies that a Web-based service would have to have more 
users in order to have comparable revenues. On the other hand there are economics in 
Web-based services because the software and the documentation do not have to be 
packaged and shipped, help-desk support has to be maintained only for the version or 
versions which are made available at the Web site. 



3 AETG"^ Web Service 

The AETG™ Web Service [2] is an industrial strength service developed by Telcordia 
researchers for enabling model based testing. As software systems become more 
complex and time to market becomes short, the deficiencies in ad-hoc and manual 



^ One can expect that as the Web becomes more reliable and more services become available 
over the Web, the difference in pricing will grow. 
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testing approaches become quite evident. One clear trend in testing in the future will 
be model based automated testing. In model based automated testing, the functional 
test requirements of the system are modeled first and then test cases, based on this 
model, are automatically generated. 

The AETG™ Web Service employs unique and promising technologies for both 
capturing the model and automated generation of efficient test cases from this model. 
Users interact with the service using a Web browser. New users of the system use the 
Web UI widgets to input the functional requirements of the System Under Test 
(Figure 1 displays a typical user interface page). The AETG™ Web Service generated 
test cases have been very efficient in revealing more failures than test cases selected 
using manual approaches. [1, 3, 4] report some of the experiences in using this 
technology in testing very large systems. 




Figure 1 : An AETG Web Service User Interface Page 

The AETG™ Web Service is now one of the Web-based services provided by 
Telcordia to its many industrial customers. Applications have been modeled in the 
areas of telecommunications, finance, operating system, and network interfaces. 
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4 Year2000tgF’^ Software 

Year2000TGpTM software is innovative software developed at Telcordia to aid users 
testing systems updated to resolve Year 2000 problems. Year2000TGpTM software 
leverages the key principle that essential business logic does not change during Year 
2000 renovation and, hence, reinvesting in the creation of brand new test cases for 
Year 2000 compliance testing is expensive and redundant. A model for the date 
dependent business logic is extracted from the existing set of regression test cases for 
the system by using the Year2000TGP software, a set of user defined rules, and a set 
of built-in system rules. The extracted model is then populated with Year 2000 
sensitive dates with the provision that the business rules of the application are not 
violated. New test cases, test data, or test scenarios are then generated using one of 
the three generation strategies used by this Test factory. These tests may then become 
a part of a user’s Y2K testing strategy. In the absence of regression test cases, the 
method works equally well with test data files and/or usage scenarios captured using a 
capture/replay tool. Telcordia Operations Support Systems has had various major 
successes using Year2000TGP’'^^ software. Details about Year2000TGP’'^^ software 
can be read in [5]. 
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5 Advantages of a Web-Enabled Service 

5.1 Ease of Development/Prototyping 

An important aspect of Web engineering is that the Web makes it easy to develop 
prototypes. We found the incremental development process relatively easier to follow 
for Web-engineered solutions. The AETG^*^ Web Service had a core API 
implemented by “C” programs that implemented the test generation algorithms. 

LiveWire^'^Qerver side programming used the core API to implement the server side 
of the system. Client side programming was done using JavaScript and HTML 
programming. The entire system consisted of nearly 65,000 lines of code. One 
developer working full time took about a year to get the Web user interface and server 
side programming completed (not including the core “C” programs). The traditional 
AETG^^ system built earlier took nearly 2 staff-years to complete (not including the 
core “C” programs). 

The Year2000TGF'’'M software was a relatively small project consisting of about 
15,000 lines of code. One distinguishing feature of this software was that it required a 
lot of interaction with the users. The system was driven by numerous user defined 
business rules, settings, default options, etc. Using Web-based navigation to obtain 
this impact made development easier than if a more traditional GUI had been used. 
For example, the user was not expected to get all the rules correct in their first 
attempt. Thus visual feedback from the software, to inform the user of input 
inconsistencies, was very important. Web-based report generation and reports with 
dynamically generated hypertext links were found to be a highly effective means of 
providing this feedback. As a matter of fact, we found this so appealing and easy to 
develop that we decided to generate Web-based reports for several other tools. 



5.2 Ease of Marketing 

The popularity of the Web and the ease of setting up user trials made it operationally 
very easy for us to market and thus get a broader user base. The Web has becomes a 
big platform equalizer. Whether one uses a mainframe, UNIX workstation, or a PC, it 
is very likely that it is Web-enabled. Needless to say, a broader user base gave us a 
perspective of our tool’s usage which would not have been possible using the 
traditional shrink-wrapped approach. 

5.3 System Maintenance 

There were no maintenance releases to ship to customer locations! Once bugs were 
fixed, all users had use of the updated software immediately. Similarly, as we 
introduced new algorithms to the service, everybody benefited from the upgrades 
instantly. Web-enabled services also provide greater flexibility when adding new 
features. For example, several users expressed an interest in having a direct interface 
to the AETG Web Service core API. Using Web-engineering, HTML forms, and 
browser file transfer features, we could extend the AETG^'^ Web Service to 



LiveWire is Netscape’ s server side programming environment. 
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accommodate this feature in less than a month. This feature enhancement would not 
have been possible using traditional methods since the system design would not have 
been flexible enough. 

5.4 Online Documentation 

Online documentation was a click away! The documentation was always up-to-date. 
There were no manuals to ship — new features and new documentation were provided 
together and manual sections found to be confusing were immediately updated. 

5.5 Online Consultation 

For a small team of developers, the benefit of on-line, internet consultation is 
tremendous. Since the service was available on the internet, value-added-resellers 
and our own researchers could demonstrate our tools without leaving our offices! 

5.6 Online Collaboration 

The Year2000TGF software application usually required input from many users to 
develop the necessary knowledge to generate the output. Because the application was 
Web based, users in different geographic locations could see and edit the work of 
others. Users have roles and access privileges to the knowledge base being created. 
While we did not incorporate extensive collaboration features in this service, in 
theory, it is easy for us to incorporate a COTS collaboration system within the 
service. One way to do so is to use a T.120 compliant system like NetMeeting that 
will allow browser sharing. However, this is real-time collaboration that is good for 
giving demos or consulting, but not suitable for teams of users working 
independently. 

A team of nsers working independently could collaboratively use the 
Year2000TGF software quite easily. User and group ids similar to UNIX file system 
ids can be associated with the users of the service. The server side can then use a 
COTS configuration management system to facilitate collaboration and saving of the 
versions of, for example, bnsiness logic rules. 

5.7 Protection of Intellectual Property 

Web service solutions protect the system internals from being de-compiled or reverse- 
engineered since the core application resides on a server protected by a Web front end 
and a firewall. The application is never transferred to a user computer. 



6 Issues for a Web-Enabled Service 

6.1 Security 

One of the biggest coneerns of commercial clients when they start using a Web-based 
service is the security of their data and authentieation of users who access the data. 
The security issues arise at three levels: 
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6.1.1 Authentication 

Both the service provider and the user want to prevent fraudulent use of the system. 
User id and password level protection is not adequate at times. For example, a client 
may share passwords to defeat the licensing agreement. In our basic security scheme 
we tied the IP address of the user’s machine with a user ID and password to provide 
an extra layer of authentication. This came at the expense of flexibility of use and did 
not work for clients who were using proxy Web servers or logging in via an Internet 
service provider. We think this is a limitation and additional cryptographic key 
authentication mechanisms are needed to prevent fraudulent use. Key management, 
however, adds an extra level of complexity to the system. We could not find good 
COTS crypto-authentication that was easy to use, manage, and easily integrate into 
our solution. 

6.1.2 Security During Transmission 

Clients had a concern about security during transmission of their data over the public 
Internet. We used Secure Socket Layer (SSL) security to adequately address this 
concern. 

6.1.3 Data Security on Service Provider’s Machine 

Clients were concerned about service providers getting hold of sensitive client data or 
other clients getting hold of the sensitive data. As a first step, we allowed the clients 
to download the data to their machines and then let them delete the data from our 
server. If they wanted to use this data again they could upload the saved file to our 
server. This solution had to be further augmented with legal verbiage in the service 
contract that allayed other fears. We felt that encryption using a client’s private key 
might help, but the service provider will still, at some stage of the execution, be able 
to see the data in clear. This can be serious limitation of Web-based services if the 
services are dealing with extremely proprietary data. 

6.2 Performance Problems 

Any serious service will have to use abundant computing server power to allow 
simultaneous access to several users. AETG Web uses some computationally 
intensive algorithms and our initial system had performance problems during peak 
usage hours. We upgraded our servers to multi-processor, state-of-the-art hardware to 
accommodate more simultaneous users. There is, however, a limitation to this scaling 
up. Thus a designer of Web-based services has to judiciously decide what intellectual 
property should stay on the server side and maximize client side processing using 
applets and plugins. 

6.3 User Performance Expectations 

While users are used to Web delays when information surfing, when using application 
software over the Web, they expect traditional standalone system performance. This is 
a significant drawback that is very difficult to overcome. It is our hope that as 
bandwidth restrictions ease a bit, users will be more comfortable with the Web’s 
response time. 
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6.4 The Myth of Platform Independence 

In theory, Web-enabled solutions are intended to be platform independent. In practice, 
we found this not to be the case. For example, JavaScript methods behave differently 
on different browsers. As a specific instance of the browser dependence of JavaScript, 
consider the onLoad method for the document page which differs for Netscape 
Navigator^M and Microsoft®’s Internet Explorer^M. To make matters worse, most of 
these differences are not documented. 



6.5 Testing Web-Enabled Services 

Testing of a Web-based service is more difficult than testing a traditional standalone 
system on a single platform. Platform dependence, as stated before, requires more 
testing. To make matters worse, no assumptions can be made about the client’s 
computing environment, JVM implementation, browser versions etc. On the other 
hand, if we were to force a particular environment then the whole point of a Web- 
based service is lost. 

A good commercial software system should have regression test cases that ideally 
are automated. One lesson we have learned is that automated regression testing of a 
Web-based solution is a must since the Web-based development cycle is short and 
release frequency is high. Automated test execution systems for Web-enabled 
applications are still in there infancy. However, we firmly believe that tool vendors 
will address this shortcoming within a year. This will allow the testing of different 
browsers to become much easier. 



7 Conclusions 

We presented two case studies of developing, deploying, marketing, and testing Web- 
enabled services. We found that Web engineering is a very effective way to get tools 
and services, based upon our research, to the field. At the current stage, we do feel 
that Web-enabled solutions require a lot of testing and have some performance, 
security, and scalability problem. As the use of the Internet grows, and the bandwidth 
handicap disappears, we are going to see more software services like the two 
mentioned here replacing the more traditional standalone tools. 
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Abstract. A successful development team for Web applications must 
be more versatile and concerned with the requirements of the end users 
than a traditional software development team. The World Wide Web 
has made it possible for new players to participate in such development 
work and also introduced new and radically different technologies for 
applications development. This paper sets out a rationale for 
considering the Web environment as a new medium, calling for a new 
paradigm of system development activities and operation. It presents a 
classification of the participants and a hierarchy for their skills. This 
classification helps in devising a strategy for successful reskilling of the 
development team and in creating the basis for the formulation of the 
corresponding development methodology. The required skills are 
mapped into three broad areas: management, technical and human 
interface. The classification scheme is widely applicable, and is useful 
in formulating and assessing course and training objectives, job 
descriptions, the skills basis for Web projects and the suitability of 
“consultants”. 



1 Introduction 

The World Wide Web has inducted new players in the field of Web application 
development and created a need for new development methodologies and 
management policies. The implications of the Web as a new medium become 
significant when an organisation is faced with the choice of moving or adapting 
existing applications to the Web or redeveloping them from scratch. 

Most proprietary software providers, such as Microsoft, Oracle, IBM, and others, 
now have HTML generating interfaces embedded in their products. However, HTML 
interfaces may not fully exploit the new medium or the full potential of the individual 
products. Further, the existing skills and orientation of the systems developers may 
not be suitable for the new medium and may have to be augmented or integrated in a 
multi-skilled team according to requirements. 

This paper sets out a rationale for considering the Web environment as a new 
medium, calling for a new paradigm of system development activities and operation. 
A classification of the participants along with skills hierarchy is presented. This 
classification helps in the Web-based application development and its use, in devising 
a strategy for successful reskilling of the development team for web applications, and 
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in creating the basis for the formulation of the corresponding development 
methodology. The classification scheme is widely applicable, and can be directly 
applied in formulating and assessing course and training objectives, job descriptions, 
the skills basis for Web projects and the suitability of “consultants”. 



2 A New Medium, New Paradigm, New Methodology 

2.1 A New Paradigm, “The Users Develop Their Own Applications” 

Networks, internetworking and “groupware” had been around at least thirty years, and 
been built up and extended from computer to computer communications developed by 
the computing “buffs” (typically computer/engineering academics). In particular, 
email was firmly established and “gopher” services available for academics. 

The introduction of the World Wide Web concept in 1991 infused into the internet 
communications a new dimension and a new paradigm. This paradigm introduced 
concepts from interpersonal communications, and in particular from electronic 
publishing. Many Web applications are developed by the users and involve creativity, 
marketing, and presentation skills besides the technical knowledge. The Internet up 
to then, hosted software and hardware applications all developed by software 
engineers on behalf of the client or user, the applications typically being proprietary 
and developed essentially in isolation from the user. The traditional life cycle, 
generally employed to develop such applications did include a user requirement phase 
and provision for feedback on completion of the package. 

Electronic publishing, and the associated area of multi-media, however, is an 
environment for the users to implement their own applications. In this environment 
users develop their own information system “applications” for predominantly 
interpersonal communications. The role of the computer professional is to act as the 
technician to provide the medium and the vehicle, but not necessarily to provide the 
product! 

This infusion of the user developed applications is now clearly seen in the Web 
environment, with Web site publishers coming from almost every facet of human 
endeavours and Web site solutions incorporating much more than just the functional 
coding. 



2.2 The New Players: Capabilities and Limitations 

As HTML currently stands (it may well change with HTML “groaning” under its 

limitations), the following summary indicates the ease of developing simple, but often 

effective Web-based Information Systems. 

a) No special tools are needed to produce an HTML page (although authoring 
packages make it even easier). Special computing professionals are not needed 
for the authoring of a page. Indeed HTML authoring is now inbuilt into almost all 
PC office programs. 

b) Graphics, animation, audio and video can be easily included in an HTML page 
for multimedia effects. Again very little computing expertise is needed. 

c) Anyone can easily upload or publish an HTML page to a hosting computer site. 

d) Simple, but effective information systems can be constructed by simply “linking 
pages”. Again no special computing expertise is required. 
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e) Simple applications in the form of “scripts” can be embedded in the HTML page 
for execution in the user’s browser, i.e. Java or Visual basic scripts. Very little 
software experience is needed, particularly since automatically generated scripts 
are available in most authoring packages. 

f) Simple applications connecting to databases can be embedded on the HTML 
page that runs on the Web server rather than the user’s browser using packages 
such as Cold Fusion. These applications by making use of dynamically generated 
pages with “drill down” links can provide effective and versatile “query by 
context” applications. Again very little computing expertise is needed. 

It is not surprising, therefore, that new players, particularly from the graphics 
design or “creative” backgrounds, set themselves up as “consultants”, by putting 
together attractive and functional Web sites. 

However, when it comes to larger information systems, with aspects of security, 
integrity of data, information system analysis and design, server-side complex 
applications and integration into existing networks, the lack of formal technical 
expertise can certainly result in badly documented, un-maintainable, un-secured and 
un-scalable systems. These systems may also have many unknown side effects due to 
poor database organisation, and networked server interactions. 



2.3 Web-Based Development: Some New Concepts and Features 

Although the original concept of the Web in 1991 was based on a text only, simple 
hypertext linkage environment, the following additional ideas and concepts have 
quickly been incorporated. These ideas and concepts present new features to be 
added to traditional client-server applications. 

The Hypertext Link 

Although the use of buttons and menu lists is firmly incorporated into traditional 
database client-server applications, it has involved an overhead of form design and 
coding. The hypertext link is an almost trivial construct in HTML. In particular, the 
ease of incorporating hypertext links into dynamically generated HTML pages is a 
relatively new querying scheme. For example, say a query to a database returns data 
on four columns (fields), and a link specification is embedded depending on the 
returned data i.e., only four links are specified and designed with the link 
specification making use of the returned data. Then if the query returns say 10 rows, 
the user has a query-by-context choice of four by ten links, i.e. forty queries for the 
overhead of only four initially specified. 

Associated with the hypertext link is its use in image maps and other graphical 
environments. Again, the ease of incorporating this form of navigation is not matched 
in traditional database systems. In moving existing applications to the web, 
proprietary software developers may ignore the advantages of the simple hypertext 
link and try to emulate their existing GUIs. 
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Table 1: Example of dynamic “querying” by context 



The database “Staff’ has fields for Staff Name, Department, Position and 
Primary Research Area. A table is set up for the results of a query to this 
database. The first row gives the column heading. Each cell in the second row 
displays the field values and also a link to another query. The query makes use 
of the field value for that cell to build the link. The links could drill down for 
further information, such as for “Research”, to query the research database and 
return all staff involved in the returned research area. 


Name 


Department 


Position 


Research 




Link 


link 


link 


Link 




If the query returns four records, 
generated. 


, then sixteen links have been dynamically 




Name 


Department 


Position 


Research 




Fred 


Phvsics 


Lecturer 


Quantum 

Mechanics 




Sue 


Maths 


Lecturer 


Calculus 




John 




Tech SuDDort 


Fross 




Linda 


Computins 


A/Prof 


Web 




By context in the table, the user can link to their next area of interest. 





A Common User Interface (CUI): From GUI to CUI 

The Web browser interface provides a common user interface for all applications. 
This concept provides the user with a “window” into the information system for all 
applications. The traditional approach has been that each application provided its 
proprietary user interface and the user had to learn the operation of each individual 
interface. The advantage of seamlessly integrating applications for the user is now 
possible. In fact the browser interface is now being inbuilt into most office type 
software. As with the hypertext link, proprietary software developers may find this a 
limitation, and in moving existing applications to the web, may try to emulate their 
existing GUI. This may arise in assumptions on screen size and resolution to actual 
placement of data on the screen. Web users now expect an easy to use environment 
and the functionality of an application has to be matched by its user “friendliness”. 

Machine and Network Independence 

Another concept that the Web introduces is the implication of machine and network 
independence. Again, this may be a problem for moving existing applications to the 
web and introduces the “global” aspect not always considered in traditional system 
design. 
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User Involvement 

With users being capable of developing their own applications (even if they are 
simple linked-pages), the design of web based information systems needs to now 
include either the integration of these applications, or even the tailoring of 
applications by “educated” users. User/developers may even demand to be able to 
“hook in” or access all or part of proprietary applications. 

Multimedia Environment 

The Web introduces the wide range of multi-media capabilities into all applications. 
Both technical and “creative” knowledge is now required to effectively use multi- 
media. 

Limited Client/User Interface and the Stateless Connection 

Web user-server interaction is essentially “half-duplex”, that is, HTML pages are 
events on their own. For example, partial information is not exchanged between 
browser and server. The basic display of information is an HTML page. The use of 
browser-side objects and scripts does provide a measure of interactivity, however if 
server interaction is required, a whole page must be transmitted for each interaction. 

Additionally, HTML is stateless, that is, each HTML page is independent of any 
other, various methods (such as cookies or ID passing) have to be incorporated to 
keep track of an individual user in a session or users in an application. 

Use of Objects and Distributed Processing 

Another concept included in Web development is the actual physical location of the 
various web components. For example, in a database application, does the application 
querying the database reside as a stored procedure in the database server, or is it part 
of the HTML page on the Web server, or is a separate application server to be used? 
Additionally an application object may be moved to the client’s browser in the form 
of JAVA applets, ActiveX or Corba type objects. There are documentation, server 
memory, CPU time, and network bandwidth considerations. There are advantages and 
disadvantages for all these approaches, the actual approach taken depending on a 
number of factors. 

Additionally, the scope of the service has always to be considered, the scope being 
whether the applications are internal as in an intranet, or external, and if access 
restrictions apply to groups of users. 

Use of Feedback 

Another concept of applications in the Web environment is the need to log and 
provide statistics of usage for the users (in addition to network performance). As part 
of an Information System, the number of “hits’, who they are, when they occur, and 
so on, may be needed as part of the rationale for the system. 
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3 Building a Successful Development Team 
for Web Applications 

By considering the Web as a new medium emerging from LAN based information 
systems and electronic publishing, a three dimensional skills space can be constructed 
for the development of Web-based information systems, the dimensions being 
management, technical and human interaction. The management vector is associated 
with the skills needed to coordinate, regulate and integrate the Web system with the 
“organisation” and existing information systems. The technical vector includes 
computing, networking and internet communications skills. The human interaction 
vector is associated with graphics design, layout, “human communications” and 
presentation skills. 




Figure 1 : The skills dimensions, each axis representing a skill area for Web “players” 

3.1 A Proposed Classification of the “Players” 

From our experience so far, six broad categories for participants in the Web 
environment can be defined. This classification is hierarchical and with overlapping 
areas. In each category, different levels have been defined, based on a skill or 
behavioral criteria. In a typical organisation, one person may belong to more than one 
of these categories and perform more than one role (A typical duplication is the end 
user or “Web User” as a Content Developer/Producer and Site Publisher). Terms like 
"Web editor" or "Web master" have not been used due to a lack of precision in their 
current usage. 

The categories are: 

1. Web User 

2. Web Content Developer/Provider 

3. Site Publisher 

4. Web Developer (or Project Officer) 

5. Web Support Officer, and 

6. Web Manager. 
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Web User 

This represents the end user or client of the Web environment. The "web browser" 
makes use of a computer/computer terminal with browsing software. This represents 
the entry level into the Web environment. Although at the basic level a "Web user" 
only needs to be able to operate a PC with packaged software, there is a higher level 
of operation requiring navigation skills, use of complex searching engines (if 
available), the ability to file transfer, awareness and invocation of security measures, 
and integration of the browsing with other desktop software. Experience with staff, 
undergraduate students and high school students indicates that 2 to 4 hours in a formal 
program is usually sufficient to take a novice to an acceptable level in browsing. 

All the other categories require the skills/knowledge base of this category. 

Content Developer/Provider and Site Publisher. 

These two categories work together. They may be combined in one individual, or for 
larger projects may represent two teams. These categories span a wide range of skills 
and have much in common. Basically, the Content Developer/Provider produces the 
material for publishing, perhaps for any site, while the Site Publisher is the technician 
to implement the material on a particular site. The Content Developer/Provider needs 
specialised skills in the human interface and interaction, coupled with creativity, 
multi-media knowledge and graphical design skills. The Site Publisher needs 
specialised skills in database interaction, HTML operation, client/server functionality, 
server-side applications, CGIs and equivalent, and multi-media applications. 

Web Support Officer 

This category is one that may be taken up by clerical staff in an organisation as part of 
administration support. The responsibilities include the updating of material on the 
Web site. This clearly needs some site publishing skills. The updating and 
maintenance may include the maintenance of databases that feed the Website. This 
will require operational knowledge of the databases and methods of accessing them 
(and may require special security and access rights). In order to maintain quality 
control over the operation of the site, site statistics are needed and a method of 
reporting on them to the users and others is required. This function is categorised as a 
function of the Web Support Officer (also part of the functions of the Web Developer 
Web Manager) There could well be a number of Web Support Officers in a typical 
organisation, covering various areas of the site. 

Web Developer 

For more than the simplest of sites, there are a considerable number of skills required 
for a Web Developer. As a typical application of an information system, the Web 
Developer needs to be aware of client-server relationships, the user requirements, the 
organisational requirements and specifications and the hardware/software resources. 
The classification system assigns the following roles to this position: - 

a. Analyse user requirements and design procedures to form a structure for the 
site. 

b. Identify areas of training and education 
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c. Set up maintenance and feedback procedures 

d. Develop and implement "policies" for the operation of the site 

e. Implement security and access rights for the site 

f. Produce documentation on the database interactivity, server side and client 
side programming and similar 

g. Have a working knowledge of multi-media concepts, specifications and 
techniques 

h. Be able to publish Web pages as required 

i. Have a good working knowledge of the hardware/software platforms suitable 
for the site 

It is recommended that user-orientated approaches to the Web based information 
system be adopted as given by Paul (1995) and Hansen & Deshpande (1996) 

The skills and knowledge base of this category are proposed as a starting point for 
the classification of the term "consultant". 

Web Manager 

This category requires technical skills similar to a Web Developer, with additional 
management skills in the areas of policy upkeep and generation, the ability to oversee 
the training and education, the ability to promote the site along with the coordination 
of the site maintenance, development and publishing activities. 

The category of Web Manager overlaps with the network administration and 
involves both technical skills in the hardware/software operation of the site, along 
with network and communication skills. The technical side includes the methods of 
log file generation, database operation, security and access rights, server operation in 
terms of CGI or similar extensions. The Web Manager also needs to be able to 
measure the effects of traffic on bandwidth and have a knowledge of the performance 
of the site. 

In each of these categories, a further hierarchical classification is made, based on 
the level of skills and the knowledge base required. These are listed in Table 2. 



3.2 An Organisational Structure for the “Players” 

Based on a case study by the WebISM group at the University of Western Sydney, 
Australia, a workable organisational structure for the above categories of players in 
terms of a Web-based information system is shown in Figure 3. It may well be that 
an individual may take on more than one role, or some positions expanded into a 
team. This structure incorporates into the web information system, the traditional 
LAN based players with the “new players” from electronic publishing/human 
interaction area. The feedback loop between a subset of the Users and content 
developers/providers is a mechanism for an evolving user-centric system 
development. This loop includes involvement from the simplest of levels such as 
members of the organisation developing their own web pages to interaction with, and 
applications for the system’s databases. 
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Table 2. Skills and Knowledge-base for Web Developers and Users 



Category Functions, Necessary Skills/Knowledge-base 



User 

Level 1 
Level 2 


Uses the Web environment as an information system for data 
retrieval, searching, interaction and communication 

Simple use of computer packages and a browser for 
information retrieval; elementary use of search engines and 
hypertext links; an understanding of URLs, saving files, 
running applications and bookmarks 

Full understanding of searching algorithms, interactivity 
with forms, use of security methods, ftping, caching and 
navigation 


Content 

Developer/ 

Provider 

Level 1 
Level 2 

Level 3 


As for Site Publisher Level 2 plus knowledge of graphic 
design 

As for Site Publisher Level 3 plus working knowledge of 
multi-media design and interpersonal communications and the 
human interface in electronic media 

Specialised knowledge in areas of marketing, education, 
psychology, or human interactions 


Site Publisher 




Level 1 


Ability to use an authoring package to produce HTML pages 
with links and bookmarks and to incorporate graphics; also to 


Level 2 


follow a procedure for uploading ("publishing") the HTML 
files to the Web site 

Ability to use forms, image maps, tables, frames generated 
from an authoring package; also to modify/add HTML tags 


Level 3 


directly with a text editor; a working knowledge of HTML in 
terms of its operation and limitations; ability to use multi- 
media concepts in page design. 

Ability to use a client side scripting language (Java 
scrip t/VB script); also to provide database interaction with 


Level 4 


generation of queries, knowledge of SQL, database operation, 
formating of output and CGI concepts; requires an operational 
overview and knowledge of the hardware/software platforms. 

As for Level 3 plus: Ability to use full server & client side 
programming techniques; a working knowledge of user 
interfaces and client-server techniques; a sound working 
knowledge of the hardware/software platforms and 
security/access methods. 




. . .contd 
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Table 2. Skills and Knowledge-base for Web Developers and Users (contd.) 



Category 


Functions, Necessary Skills/Knowledge-base 


Web 

Support 

Officer 

Level 1 
Level 2 


Responsible for the updating of databases/pages; monitoring, 
collecting and collating of feedback on the site operation and 
disseminating information as needed. 

Same as Site Publisher Level 1 

Ability to monitor/access log files to produce statistics and 
usage; operational knowledge of the databases in use, in terms of 
data entry & editing; knowledge of the structure and organisation 
of the site. 


Web 

Developer 

Level 1 

Level 2 


Analysis of user requirements and design & development of Web 
based information systems for use on a host Web site; also 
identifies areas of training and education and works with Web 
Manager; Well developed project management and professional 
skills; typical category for a "Web Consultant". 

Skills as for Site Publisher Level 3; plus ability to identify the 
needs/resources/areas and the nature for training/education; 
ability to set up maintenance and feedback procedures; 
background in user interfaces and general information systems 
concepts; capability to implement site "policies" and necessary 
security/access procedures. 

In addition to Level 1, knowledge of multi-media techniques 
plus the technical knowledge as for Site Publisher Level 4; ability 
to produce/deliver the required education/training. 


Web 

Manager 


In charge of the total site, its operation, database and server side 
programming, security, access rights, logging, and methods of 
publishing & maintenance. Overlaps with network administration. 

Requires operational knowledge of database interaction, server 
and client side Web programming, the hardware/software 
platforms, the logging and event monitoring procedures and 
security/access methods; CGI (or similar) concepts as needed in 
terms of HTML and site operation; also full operational 
understanding of uploading, publishing and ftp facilities. 

Responsible for policies, review, monitoring of operation, user 
feedback, education & training, promotion of the site, 
coordination of site developers, maintenance and publishers. 

Requires technical skills as for Site Publisher Level 2, along 
with information systems concepts and basic management skills 
Additional skills in marketing and the ability to coordinate IT 
with all participants in the Web environment. 
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Between the categories there are overlaps and the hierarchy from category to 
category is not fully inclusive as shown in the skills space diagram (Fig. 2). 




Figure 2. Skills space diagram 
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An important component not clearly shown is the inclusion of skills training which in 
the web environment can be through formal programs or through self directed 
learning. The duties of the Web Developers and the Web Manager include training, 
system structure and guidance for an evolving system. 




Figure 3. Organisational structure of the Web development team 



The skills hierarchy as in Table 2 provides a guide for the technical, managerial 
and human interaction skills base needed for this structure to operate. 



3.3 The Development Team 

All the above provides the framework for putting together a successful web 
development team. The main players in such a team include at various levels, the 
Web Manager, Web Developer, Site Publisher, Content Developer/Provider and the 
Web Users, the mix and match of involvement depending on the actual applications, 
organisation and existing staff. In deciding to either move existing or legacy systems 
to the web or to develop applications from scratch, the following additional factors 
bearing upon the composition of a Development Team may be borne in mind. They 
are based on our own experience. 

3.3.1 Movement of Legacy Systems to the Web 

Most existing non- web applications, particularly database applications, have been 
developed by proprietary software providers. The background and history of these 
systems typically reflect the axiom that an analysis of the proposed system will 
identify a set of design criteria that can then be coded, the emphasis being in a 
functional solution, rather than perhaps user satisfaction or ownership. The pitfalls of 
this traditional analysis/deign approach have been pointed out in a number of studies 
of the failures of large computer projects. This approach to web systems does not 
include the “new players” and the human interface skills applicable for the web 
medium. 
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Most software providers currently supply web interfaces that attempt to mimic the 
original applications. Usually, the interface still has to modified and tailored. This can 
be a time consuming process. The implicit pitfall in this approach is illustrated by an 
analogy of producing black and white movies after the change of medium to colour. 
In operational terms, although there may be savings in developing the functionality, 
the functionality is usually locked away in proprietary systems and not available to 
the “new players” or may not even have “hooks” for integration into other web 
applications. 

It is not surprising that existing proprietary information system providers 
(particularly again in the database area), wish to keep all the application development 
proprietary (along with their own web “consultants”). As a typical scenario, consider 
the use of stored procedures in a database. The weh environment introduces a range of 
choices for the location of applications. They can reside in the database, may reside as 
objects on the server or embedded in server templates/scripts or may be objects 
moved to the browser. There are advantages and disadvantages for each approach. 
Further, for a given application, the need for documentation, the use of existing 
algorithms, communication bandwidth and integration with other applications also 
need to be considered in detail. 

3.3.2 New Applications 

Alternatively, if all applications are designed from scratch, the time factor and degree 
of training for the new players may not be cost effective in the short term. The team 
balance here needs the input from the traditional software areas for workable 
structures and the ability to scale up and develop the applications in the future. 



4 Conclusions 

From a skills framework, and a suitable organisational structure, a successful 
development team for web applications can be built. The skills profile of the players 
can be considered as a three dimensional skills space with vectors in the management, 
technical and human interface areas. This skills framework can also assist in assessing 
training requirements, job classifications and the selection of “consultants”. 



References 

1. Hansen S. & Deshpande Y. (1996) Utilising a User Hierarchy in Information 
Systems Development, Proceedings of the 1996 Australasian Conference on 
Information Systems, Hobart, 10-13 December 1996, pp 299-306 

2. Patel N. (1995) Tailorable Information Systems: Thesis Assertions, 

Counterpositions and Confutations LIST Workshop Series 2. Department of 
Computer Science & Information Systems, Brunei University 

3. Paul R. (1994) Why Users Cannot "Get What They Want”, Computer Science 
Department, Brunei University 

4. Hansen S., Deshpande Y., & Murugesan S., (1997) Identifying a skills hierarchy 
for participants in the Web Environment (The "Who needs to know what") 
Proceedings of the 1997 AUSWEB Conference, Gold Coast, Australia. 




A Case Study of a Web-Based Timetabling System 



1 2 

Shu Wing Chan and Weigang Zhao 

* Up’n’Away Net Solutions 
PO Box 88, Floreat Forum 6014 WA Australia 
chansw@upnaway . com 

2 

School of Information Systems Curtin University 
GPO Box U1987 Perth 6845 WA Australia 
zhaocOcbs . curtin . edu . au 



Abstract. This paper first describes various interfacing mechanisms 
between the Web and databases. Secondly, a comparative analysis of 
different web-database interfacing mechanisms is presented. Thirdly, a 
web-based timetabling system implemented using one of the 
mechanisms is described. Finally, conclusions and future work are 
reported. 



1 Introduction 

With the increasing popularity and advancement of the Web technology, many legacy 
information and database systems are being migrated to the Internet and the Web 
environments. It is believed that the integration of the Web and database technology 
will bring many opportunities for creating advanced information management 
applications [1]. Web-to-database tools have been made available for developing such 
applications. With live Web databases, travel agents can keep flight schedules and 
fares updated; businesses can update inventory lists and prices; customers can look up 
the latest prices and order products online; employee training can be an interactive 
experience; and a much greater range of information publishing applications is 
possible. 

Taking simple data from a database and placing it on the Web is a relatively simple 
task. However, in most cases, the corporate data is maintained in a variety of sources, 
including legacy, relational, and object databases. It is much more complicated when 
these diverse data sources must be queried or updated [2]. With the level of 
complexity of the applications increases, there is a growing concern about the way 
that these applications were developed. "Web Engineering”, as a new discipline, has 
been coined and promoted by a series of workshops and seminars [3]. 

There are many players in the industry taking this challenge. These include major 
database vendors, mainframe vendors, third party software firms, Web browser 
vendors, and Web server vendors. A wide range of tools and philosophies has been 
proposed for connecting and integrating the Web and databases [4] [5]. In last paper 
[6], we presented a web-based configuration database application implemented using 
different web-to-database interfacing techniques. This paper is to describe a real-life 
web-based timetabling system developed and deployed at Curtin Business School, 
Curtin University of Technology. 

S. Muragesan and Y. Deshpande (Eds.): WebEngineering 2000, LNCS 2016, pp. 236-244, 2001. 
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The remainder of the paper is organised in three sections. Section 2 describes 
various web-to-database interfacing approaches and presents a comparative analysis 
of different approaches. Functions of a web-based timetabling system are described in 
Section 3. Conclusion and future work are reported in Section 4. 



2 Web-to-Database Interfacing Approaches 

Delivering data over the Web is cost effective and fast, and gives Internet users easy 
access to databases from any locations. Users hope to access databases via Web 
browsers with the same functions as provided by normal database application 
software. Businesses want to provide their users or customers various functions such 
as purchasing goods, tracking orders, searching through catalogues, receiving 
customised content, and viewing interesting graphics. The Web-to-database 
integration has become central to the jobs of corporate information systems 
construction. 

Making database information available to Web users requires converting it from 
the database format to a markup language such as HTML or XML. Database 
packages store information in files optimized for quick access by front-end programs. 
When a Web server sends information to a client, the internal database format must be 
converted to HTML so that it is displayed correctly [7]. 

To build a bridge between Web and enterprise databases, a number of alternative 
technologies and architectures have been made available. These include: 

• CGI (Common Gateway Interface) is a Web standard for accessing external 
programs, to integrate databases with Web servers. The CGI dynamically 
generates HTML documents from back-end databases; 

• Web server APIs, such as Microsoft's Information Server API (ISAPI), Netscape 
API (NSAPI), and Apache's server API, are invoked by third party software to 
access remote databases; 

• Web-ODBC (Open Database Connectivity) gateways rely on an open API 
(Application Programming Interface) to access to database systems; 

• JDBC (Java Database Connectivity) is used in its Java programming language to 
program Java applets to access back-end database servers. 

• CORBA (Common Object Request Broker Architecture) API is the key software 
component of a set of distributed systems standards. It provides a high level 
interface to enable objects to communicate with other objects, regardless of 
network location or platform. 

• DCOM (Distributed Component Object Model) is the distributed extension to 
COM (Component Object Model) that builds an object remote procedure call 
(ORPC) to support remote objects. 

Each of the above technologies has strengths and weaknesses. Several factors 
should be considered when making selections. These include the complexity of data, 
the speed of deployment, the expected number of simultaneous users, and the 
frequency of database updates. However, new technology is emerging and several 
tools are already available that make this Web-to-database access optimised for 
improved performance [2]. 
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2.1 CGI 

Common Gateway Interface (CGI) is the oldest and the most commonly used method 
for implementing a Web database gateway. The CGI scripts/programs can be written 
in a number of programming languages, which include Perl, C/C++, TCL, Visual 
Basic etc. CGI is the standard feature on all web servers across different platforms. In 
CGI, an HTML form is used as the graphical user interface for obtaining the user 
request. The server then transfers the request to the gateway program. The gateway 
program either processes the request itself or passes the request to another 
program/system to process. The program/system generates the results into HTML 
format and transfer back to the web server. The web server presents the result back to 
the client. 

Currently, Perl is the dominant language for writing CGI applications. Perl itself 
does not have any database support but it has a large range of modules including 
database support. Two of the widely used database modules are DBI (Database 
Interface) and Win32 ODBC. These modules provide the unique API interface to 
access different vendor databases. As a result, only the driver in the code needs to be 
changed when the database is changed. 

The advantage of CGI approach is it is simple to implement and freely available on 
all web servers on different platforms. However, it has several limitations. A process 
is spawned with each request and that is time-consuming and expensive in system 
resources. Moreover, it cannot allow any database connection to be maintained and 
that means each time a CGI script queries the database, a new connection is started 
between the CGI and the DBMS. 



2.2 Server API 

An alternative to modifying or extending the abilities of the server is to use its API. 
APIs allow the developer to modify the server's default behavior and give it new 
capabilities. In addition to addressing some of the drawbacks of CGI, the use of an 
API offers other features and benefits, such as the ability to share data and 
communications resources with a server, the ability to share function libraries, and 
additional capabilities in authentication and error handling. Because an API 
application remains in memory between client requests, information about a client can 
be stored and used again when the client makes another request [8] [9]. Examples 
include Netscape's NSAPI, Microsoft's ISAPI, and Apache API. 



2.3 Active Server Pages (ASP) 

Active Sever Pages (ASP) is a server-side scripting technology developed by 
Microsoft. It allows scripts embedded within the HTML pages with any scripting 
language that comply with the ActiveX technology. Currently, you can use VBScript, 
Jscript (Javascript from Microsoft) and PerlScript. It is shipped with Microsoft 
Internet Information Server (IIS) and O’Reilly Web Server Pro that can run on 
Windows 95/98/NT platforms. With some third-party plugins like Chilli ASP, ASP 
can be run on other web servers and platforms. Database connectivity is provided in 
ASP with any ODBC-compliant databases. With HTML form as the front-end, client- 
server applications can be built with ASP. It supports the execution of SQL 
statements and store procedures. 
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Active Server Pages executes faster than traditional CGI because it is running in 
the same process space as the web server. There is no need to create a separate 
process. Thus, it uses less memory and system resources. However, it can crash the 
web server when there is an error in the script because it is running in the same 
process space as the web server. Although ASP was designed primarily for 
Microsoft's IIS, it is now available on Unix, Lotus Domino, and Netscape Enterprise 
servers [10]. 



2.4 Java Servlets 

A Java servlet is a generic server extension that expands the functionality of a server. 
A Java Servlet can be used to replace CGI. It runs inside a Java Virtual Machine 
(JVM) on the server so it is secure and portable [11]. The difference between Java 
Servlet and Java Applet is Servlet does not require any Java support on the client as 
all the processing is on the server side. Since Servlets are written in Java, they have 
full access to Java APIs and third-party component classes. Also, Java Servlets are 
portable across operating systems and web servers. Java Servlets are more efficient 
than Common Gateway Interface (CGI) because they execute within the web server’s 
process space and they persist between invocations. 

A database gateway can be written in Java Servlets with JDBC. JDBC is a Java 
SQL API for accessing the databases. It has a set of classes and interface for writing 
database applications using a pure Java API. Like ODBC, JDBC provides an unique 
interface for virtually any relational database. With the JDBC API, a single program 
can send SQL statement to the appropriate database. JDBC is built on ODBC and 
both of them are based on X/Open SQL CLI (Call Level Interface) except JDBC 
builds on the styles and virtues of Java. 



2.5 ODBC and JDBC 

ODBC and JDBC are types of database access middleware. Database vendors and 
several third-party software houses offer ODBC and JDBC drivers for a variety of 
databases and operating environments. From a network administrator's point of view, 
they consist of client and server driver software (i.e., program files). From a 
programmer's point of view, they are APIs that the programmer inserts in his or her 
software to store and retrieve database content. While a system analyst perceives 
ODBC or JDBC as a conceptual connection between the application and the database, 
database vendors regard ODBC and JDBC as ways to entice customers who say they 
want to use industry standard interfaces rather than proprietary ones. And managers of 
data processing department view ODBC and JDBC as insurance interfaces that offer 
managers some measure of flexibility should they find it necessary to replace one 
database product with another [12]. 

ODBC technology now allows Web servers to be used to directly connect with 
databases, rather than using third party solutions. JDBC can also directly access 
server ODBC drivers through a JDBC/ODBC Bridge driver, available from SunSoft. 
ODBC driver vendors are also building bridges from ODBC to JDBC. JDBC is 
intended for developing client/server applications to access a wide range of backend 
database resources. 
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3 Functions of a Web-Based Timetabling System 

This web-based online timetabling system was initially developed by School of 
Computing and has been enhanced and modified for Curtin Business School. The 
system comprises of a set of CGI scripts written in Perl 5 with OraPerl. Form 
validation and some user interface enhancements are performed with Javascript. Two 
sets of functions have been provided in this system. One set of functions is used by 
students and another set of functions is used by timetable administrators. The system 
runs on a Sun workstation with Apache 1.3.9 and OracleS Enterprise Edition Release 
8.0. 5.0.0 - Production. 

3.1 Functional Components Used by Students 

The student is welcome by the login page where the username, password and email 
address are needed for accessing the system. The edit class page comes up next where 
the student enters up to 5 unit index numbers. Again, all the inputs are validated with 
JavaScript before they are sent to the server for further processing. On this page, the 
student can view what they have enrolled so far and they can remove an enrolled class 
or print out the timetable. Jt will also check the total number of units that each student 
is permitted to enroll does not exceed 5. 

The available classes for each unit will bring up for the student to select. Jf there 
are no more class seats available for a unit, the student will be put into the waiting list 
automatically. Once the classes are selected, it will check if the unit is already 
enrolled. Jf everything is OK, the student can click the button to enroll the units and 
their details will be recorded into the database. 

The functions of some scripts are described as follows: 

Login.cgi.This script is used to authenticate students when they enter the system. 
Three user inputs are required. They are the student number (which acts as the 
username), the date of birth (which acts as the password) and the email address. 
After a student fills out all the required fields, he/she can click on the login button. Jf 
any of the input is missing, the student will be prompted with a dialog box (which is 
implemented using JavaScript). This script opens a database connection and retrieves 
the student number and date of birth and compares with the user input. Jt will also 
update the email address if it is not there or changed when the username and password 
are valid. Eigure 1 shows the login page of the timetabling system used by students. 
Edit.cgi: This script presents the main page where the student can manipulate on 
his/her enrollment. Jf the student has already enrolled at least one unit, the timetable 
will be displayed on the top. At the same time, there is a hyperlink to print.cgi where 
the timetable is displayed on a separate page and can be printed out. On the left side 
of each enrolled unit, the student can click on the hyperlink ‘delete’ which calls up the 
remove. cgi to remove the enrolled unit. Jf the student has not enrolled anything yet, 
he/she can fill in the five boxes with the units he/she would like to enroll. There is an 
option to display a box with all the unit information for reference. 

Times. cgi: This script gets the student number along with the units selected to enroll 
as input. The available classes for each selected unit will be printed in an option box. 
The student has to highlight the class they like to enroll and click on ‘Continue’ . Jf 
there is no class available for a unit, the student will be put on the waiting list and 
appropriate message is displayed on the screen. 




A Case Study of a Web-Based Timetabling System 241 



Enrol.cgi: It updates the student enrollment details to the database when the 
selected classes are still available. If the selected classes are not available at the time 
when the student is trying to submit, the student will be put on the waiting list 
automatically. After the updates on the database, the student can print out the 
timetable or exit the system. Figure 2 displays a student timetable generated by the 
system. 




Figure 1 The login page of the timetabling system used by students 
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Figure 2 A student timetable 
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3.2 Functional Components Used by Administrators 

In the timetabling administration system, there are six major components, which 
include student management, unit management, class management, waiting list 
management, statistics report and upload/download. The timetable administrator can 
manipulate the data in the database via the Web interface like adding classes, editing 
unit details, moving the students from the waitlist to the new classes and email to all 
the students for any changes. Also, the system allows the timetable administrator to 
upload large amount of username and password or class information via the Web 
interface and the insertion of data to the database is done automatically. Appropriate 
error message comes up if a problem is encountered. Moreover, the administrator can 
also download the classes and/or units enrolment information at anytime. With the use 
of HTML interface, that makes the person without any database and/or SQL 
knowledge to carry out the tasks as mentioned above. The functions of each 
component are described in the following sections. 

3.2.1 Student Management 

This component allows the timetabling administrator to add, edit, search student 
details. The administrator can also send email to all the students on the system for 
important announcement of any changes. 

3.2.2 Unit Management 

This component is used to add, edit, and search the details of a unit. It can also list the 
quota of a lecture, seminar, tutorial, and lab in a unit. 

3.2.3 Class Management 

This component is used by timetabling administrator to add, edit, list, and delete all 
the details of a class such as class id, unit index, day, time, location, type ete. It can 
also be used to email the students for a particular unit or a class. 

3.2.4 Waiting List management 

This component lists the students on the waiting list for a particular unit. Timetabling 
administrator can also use this component to move the students on the waiting list for 
a particular unit to the new classes the administrator creates. All the students are 
listed on the top with a option box next to each one of them. The administrator ticks 
the boxes to select students to move classes. At the bottom, there is a box to send an 
email to those students who are moved to the new class. 

3.2.5 Statistics/Report 

This component produces report on class usage, student's unit overload, etc. 

3.2.6 Download/Upload 

This function allows bulk upload of class, unit and student information to the 
database. A common delimited format file is used as input to upload to the web 
server via the web interface. 
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4 Conclusions and Future Work 

This paper first has described various interfacing mechanisms between the Web and 
corporate databases. Then features of various web-database interfacing mechanisms 
have been presented. Finally, a web-based timetabling system implemented by using 
one of the interfacing methods has been described. Certain changes and 
improvements in the coming version have been identified. These include enrollment 
procedure, session management, portability, user interface, security, and performance. 

We are interested in investigating performance implications of different web-to- 
database interfacing methods. This is vastly different from the general database 
system benchmark, general Web server benchmark, and WWW caching performance 
measurements reported in literature [13] [14]. The performance measurement factors 
include connections per second, throughput (bytes per second), average response time 
(round-trip time), error rate, and web overhead ratio. Currently, we are also working 
on ways to minimise or eliminate potential factors that may influence the performance 
measurements. 

Currently, the system is being ported to Windows NT with ASP in Perl and 
MSQL 7.0 for the studies of implementation issues and performances comparisons. 
Session management and stored procedures are included. 

Future work would include implementing the same application using other 
different interfacing methods discussed in this paper and conducting experiments to 
provide comprehensive quantitative performance comparisons. In addition, porting 
the application to different platforms would produce valuable feedback to the 
refinement of performance measurements. 



References 

1. Feng, L. and Lu, H. “Integrating Database and Web Technologies”, 
International Journal of World Wide Web, Vol.l, No. 2, pp. 73-86, 1998. 

2. Carriere, J. and Kazman, R . “WebQuery: searching and visualizing the Web 
through connectivity”. Proc. Of the 6th International ITWW Conference, pp. 
701-711,1997. 

3. Murugesan, S., Deshpande, Y., Hansen, S. and Ginige, A. “Web Engineering: 
A New discipline for Development of Web-based Systems”. Proceedings of the 
first ICSE Workshop on Web Engineering, International Conference on 
Software Engineering, Los Angeles, May 1999. [ URL: 
http://fistserv.macarthur.uws.edu.au/san/icse99-webe/ ] 

4. Kim, Pyung-Chul. “A Taxonomy on the Architecture of Database Gateways for 
the Web.” Proceedings of The 13th International Conference on Advanced 
Science and Technology (ICAST97). pp. 226-232, 1997. 

5. Ashenfelter J.P. “Web Database Development Tools, Metaphorically Speaking”. 
WebNet Journal, pp. 14-15, 23, April-June 1999. 

6. Zhao, W. “A study of web-based application architecture and performance 
measurements”. Proceedings of 99Australian Web Conference (AUSWEB99), 
pp. 50-58, 1999. 

7. Reichard, K. “Web servers for database applications”. DBMS v9(nll), p31, 
1996. 




244 



Shu Wing Chan and Weigang Zhao 



8. Frey, A. “Web-to-database communication with API based connectivity 
softwarer Network Computing 'Hom 15 v7 nl8: 134(7) 1996. 

9. Duan, N.N. “Distributed Database Access in a Corporate Environment Using 
Java.” Proceedings of the 5th International World Wide Web Conference, May 
6-10, 1996, Paris, France. 

10. Wille, C. Unlocking Active Server Pages, New Riders, 1997. 

11. Hunter, J and Crawford W. Java Servlet Programming, O’Reilly, 1998. 

12. Wong, W. “Back-end Web Databases (Making corporate data available through 
Web servers)”. Network VAR, 5(12), pp. 67-72, 1997. 

13. Lazar, Z.P. and Holfelder, P. “Web Database Connectivity with Scripting 
Languages.” Web Journal, Vol. 2, Issue 2. 1997. 

14. Saleeb, H. “Real-Time Database Theory and World Wide Web Caching”. 
[URL: http://www.eecs.harward.edu/~saleeb/projects/265.html] (Harvard Univ., 
1997) 




Performance, Testing and Web Metrics 



1 Overview 

One classification of Web-based systems is along the lines of intranet, extranet and 
the Internet applications. The intranet and extranet applications generally have well- 
identified user base, perhaps larger than hut similar to the traditional applications. 
The Internet applications, however, reach untold numbers whose interests, behaviour 
patterns and circumstances could vary enormously, unlike the other two classes. 
Consequently, the questions of performance, testing strategies and measurements 
become much more complicated for Web-based applications than has been the case in 
the traditional application area. The five papers in this section address these aspects 
which are often overlooked in most Web applications. 

The first paper. Engineering Highly Accessed Web Sites for Performance, tackles 
these questions at the truly Olympian heights, by reporting on the actual experience of 
building and testing the Olympic sites for the 1998 Winter Olympics in Nagano, 
Japan, and the 2000 (Summer) Olympics in Sydney, Australia. It describes techniques 
for designing Web sites which need to handle large number of requests (hits), provide 
high availability, or generate significant dynamic content. These techniques include 
using load balancers to distribute load among multiple Web servers, Web server 
accelerators, and caching of dynamic pages. The paper presents the architecture of a 
scalable and highly available Web server accelerator which significantly improves 
Web server performance along with a publishing system for efficiently generating 
complex dynamic pages from simpler fragments. The authors describe new 
techniques specifically developed for keeping cached dynamic data current and 
synchronizing caches with underlying databases. The proof of the pudding is that the 
1998 Olympic Games Web site achieved quick response times even under peak loads 
which set world records and was available 100% of the time. The Sydney 2000 
(Summer) Olympics had not taken place by the time this paper was presented but by 
all anecdotal accounts, the Web server performance in those games was highly 
satisfactory. 

The second paper, Specifying Quality Characteristics and Attributes for Web Sites, 
examines the Web sites in a different way. Managing Web-based systems is a 
complex task and involves among other things, the management of: a) documents, b) 
links, c) performance of servers and networks, d) updating of the systems/sites, and e) 
measures to judge the suitability and performance of a given system/site. The paper 
analyses in detail several academic sites to identify and recommend over 100 criteria 
and sub-criteria to measure the usefulness and quality of Web sites. The authors build 
a quality requirement tree and a descriptive framework from the criteria, and 
demonstrate how these procedures help in a quantitative evaluation of a given site. 
The outcome is a Web-site Quality Evaluation Method (QEM), grounded in a logical, 
multi-attribute decision model and procedures. QEM is useful not only to improve 
operational Web sites but also in planning new applications and sites. 
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The third paper, A Framework for Defining Acceptance Criteria for Web 
Development Projects, addresses the question of acceptance criteria. The success of 
any Web-based system ultimately depends on its acceptance by the intended, and 
unintended, users. It is, therefore, imperative to understand and enumerate the 
possible acceptance criteria from the users’ points of view. The paper proposes a 
framework for acceptance criteria to cover diverse dimensions such as: user’s 
problem statement, product vision, content modelling, user interaction, development 
constraints, non-functional requirements, application evolution and maintenance. 

The engineering approach demands consistent and regular measurements of all the 
activities undertaken. Web metrics is in its infancy and can benefit from software 
metrics. The fourth paper in this section. Measurement and Effort Prediction for Web 
Applications, reports on a case study in which a set of metrics are proposed for Web 
authoring. The data collected is analysed using three independent models: estimation 
by analogy, linear regression and stepwise linear regression. The metrics include 
variables such as hyperdocument size, number of reused documents, extent of 
connectivity, measure of compactness of links, rating for sequential and hyperlinked 
organisation, structure of the application and total effort in terms of estimated elapsed 
time to complete various tasks. 

The last paper in this section, Web Navigability Testing with Remote Agents, 
concentrates on usability testing with remote agents. Users expect highly effective 
and easy-to-learn navigation mechanisms and metaphors. They need to determine 
what they can get in their environment and how to find their way around. Usability 
testing, generally done in a controlled environment, focuses on navigation to get as 
much information as possible on how the end-user makes use of the different 
navigation mechanisms designed and incorporated in a Web site. Web usability 
testing in controlled environments has several drawbacks that can be overcome 
through remote testing using agents. The paper reports on a case study of remote 
testing designed to augment the classic usability testing. 
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Abstract. This paper describes techniques for improving performance 
at Web sites which receive significant traffic. Poor performance can be 
caused by dynamic data, insufficient network bandwidth, and poor Web 
page design. Dynamic data overheads can often be reduced by caching 
dynamic pages and using fast interfaces to invoke server programs. Web 
server acceleration can significantly improve performance and reduce the 
hardware needed at a Web site. We discuss techniques for balancing load 
among multiple servers at a Web site. We also show how Web pages can 
be designed to minimize traffic to the site. 



1 Introduction 

Performance is a critical factor at many Web sites. Popular Web sites can receive 
hundreds of thousands of hits per minute. A Web site which only receives a 
moderate amount of traffic can also suffer from slow response times if a significant 
percent of the requests are for dynamic data which are expensive to create since 
dynamic data can consume orders of magnitude more CPU time to create than 
static data. 

There are a number of techniques which can be used to improve performance 
at a Web site. Multiple processors can be used to scale the CPU capacity of the 
system. We will describe techniques for balancing load among multiple proces- 
sors. In order to reduce the overhead for generating dynamic pages, it is often 
possible to cache the dynamic pages and re-use cached copies instead of gener- 
ating a new copy each time. The cache must be explicitly managed so that it 
is kept consistent. It is also important to use an efficient interface for invoking 
server programs to create dynamic data. 

Web pages should also be properly designed to optimize performance. Web 
pages can often be redesigned to provide more information close to the home 
page so that less navigation is required to obtain critical information. Encryp- 
tion consumes significant CPU cycles and should only be used for confidential 
information; many Web sites use encryption for nonessential information such 
as all of the image files included in a Web page. 

The remainder of the paper is organized as follows. Section 2 describes how 
multiple Web servers can be deployed to service high request rates and techniques 
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for routing requests to such servers. Section 3 describes how performance can 
be improved with Web server acceleration. Section 4 describes techniques for 
efficiently serving dynamic data. Finally, Section 5 describes other techniques 
affecting performance such as page design. 



2 Multiple Web Servers 

In order to handle significant traffic, Web servers must use multiple servers 
running on different computers. Some sites use mirroring in which different Web 
sites contain the same information. Clients are responsible for selecting one of 
the mirrored Web sites. In some cases, the mirrored sites are geographically 
dispersed, and clients are supposed to select the closest Web site. 

There are a number of problems with mirroring. Clients must select an ap- 
propriate Web site. This puts an extra burden on clients. There could be con- 
siderable load imbalances if some sites are selected more than others. Mirroring 
doesn’t solve the problem of routing requests from one site to another when one 
of the sites fail. There is also administrative work in maintaining multiple Web 
sites and providing consistent content across the Web sites. 

Because of the problems with mirroring, it is often preferable to have a sin- 
gle Web site being serviced by multiple servers running on different computers. 
The servers might share information using a shared file system such as the An- 
drew File System (AFS) or Distributed File System (DFS). Information can also 
be shared via a shared database or replicated across independent file systems 
running on the servers. 



2.1 Routing Requests to Multiple Web Servers 

One method for distributing requests to the various servers is by using the round- 
robin Domain Name Server [1,16] (RR-DNS). RR-DNS allows a single domain 
name to be associated with several IP addresses which could each represent dif- 
ferent Web servers. Client requests specifying the domain name will be mapped 
to Web servers in a round-robin fashion. 

There are several problems that arise with this method. First, caching of 
name-to-IP address mappings at name servers can cause load imbalances. There 
are typically several name servers between clients and the RR-DNS that cache 
the resolved name-to-IP address mapping. In order to force a mapping to dif- 
ferent server IP addresses, the RR-DNS can specify a time-to-live (TTL) for a 
resolved name, such that requests made after the specified TTL are not resolved 
in the local name server, but are forwarded to the authoritative RR-DNS to 
be re-mapped to the IP address of a different HTTP server. Multiple name re- 
quests made during the TTL period will be mapped to the same HTTP server. 
If the TTL is made very small, there is a significant increase in network traffic 
for name resolution. Therefore, name servers often impose their own minimum 
TTL, and ignore small TTL’s given by the RR-DNS. There is thus no way to 
prevent intermediate name servers from caching the resolved name-to-IP address 
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mapping, even by using small TTL’s. Many clients, for instance those served by 
the same Internet service provider, may share a name server, and may therefore 
be pointed to a specific Web server. 

A second problem is that client caching of resolved name-to-IP address map- 
pings can cause load imbalances. Since the clients may make future requests at 
any time, the load on the HTTP servers cannot be controlled subsequently and 
will vary due to statistical variations in client access patterns. Further, clients 
make requests in bursts as each Web page typically involves fetching several ob- 
jects including text and images, and this burst is directed to a single server node, 
increasing the skew. It is shown in [8] that these effects can lead to significant 
dynamic load imbalance, requiring that the cluster be operated at lower mean 
loads in order to be able to handle peak loads. 

Another problem with RR-DNS is that round-robin is often too simplistic a 
method for providing good load balancing. It is desirable to consider other factors 
such as the load on individual servers. For instance, a particular Web server may 
become overloaded due to requests for dynamic data, which is constructed from 
many database accesses at a server node. 

Finally, another important problem with RR-DNS is client and name server 
caching of resolved name-to-IP address mappings make it difficult to provide 
high availability if Web server nodes fail. Since clients and name servers are 
unaware of Web servers going down, they may continue to make requests to failed 
servers. Similarly, it may be desirable to bring down a specific Web server node 
of a cluster for maintenance purposes. Again, making IP addresses of individual 
Web servers visible to the client and name servers makes it more difficult to 
achieve this. It is possible to configure back-up servers and perform IP address 
take-overs when Web server node failures are detected, or when a node is to be 
brought down for maintenance. However, not only is this hard to manage, but 
if the back-up node is active, it may get twice the load after failure of a primary 
node. 

Another method for achieving load balancing is based on routing at the TCP 
level (rather than standard IP routing), and is illustrated in Figure 1. A node of 
the cluster serves as a so-called TCP router (s), forwarding client requests to the 
different Web server nodes in the cluster in a round-robin (or other) order. The 
name and IP address of the router is public, while the addresses of the other 
nodes in the cluster are hidden from clients. The client sends requests to the 
TCP router node which in turn forwards all packets belonging to a particular 
TCP connection to one of the server nodes. The TCP router can use different 
algorithms based on load to select which node to route to, or use a simple round- 
robin scheme. The server nodes directly send the response back to the client, 
bypassing the TCP router. Note that the response packets are large compared 
to the request packets; these bypass the router. Thus, the overhead added by 
the TCP router is small. 

One advantage of the router scheme over the DNS-based solutions is that 
good load balancing can be achieved and there is no problem of client or name 
server caching. It is shown in [8] that the use of a TCP router results in better 
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Fig. 1. A TCP router can load balance requests to multiple Web servers. Re- 
sponses from the server go directly to clients, bypassing the router 



load balancing than RR-DNS. Another advantage of using TCP routers is that 
the router can use sophisticated load balancing algorithms which take the load 
on individual servers into account as opposed to simple round-robin. Finally, the 
TCP router provides high availability by detecting failure of Web server nodes, 
and routing user requests to only the available Web server nodes. In addition, 
for easy maintenance of the Web server cluster, the TCP router configuration 
can be changed to remove or add Web server nodes. Failure of the TCP router 
node itself is handled by configuring a back-up TCP router [8]. The back-up 
TCP router can operate as a Web server during normal operation; on detecting 
failure of the primary TCP router, the back-up TCP router would route client 
requests to the remaining Web server nodes, possibly excluding itself. 

There are a number of commercially available TCP routers. One example 
is IBM’s Network Dispatcher (ND) [10] which runs on stock hardware under 
several Operating Systems (OS), including Unix, Sun Solaris, Windows NT, 
and an embedded OS optimized for communication. The advantage of using an 
embedded OS is that router performance is improved by optimizing the TCP 
communications stack, and eliminating the scheduler and interrupt processing 
overheads of a general-purpose operating system. ND can route up to 10,000 
HTTP requests per second (when running under an embedded OS on a unipro- 
cessor machine). Other commercially available TCP routers are the Web Server 
Director by Radware [17] and the Resonate Central Dispatch [18]. Cisco Sys- 
tems’ LocalDirector [7] differs from the TCP router approach because packets 
returned from servers go through the LocalDirector before being returned to 
clients. A comparison of different load balancing approaches is contained in [2]. 

If a single TCP router has insufficient capacity to route requests to a site 
without becoming a bottleneck, the TCP router and DNS schemes can be used 
together. For example, a number of router nodes can be used, and the RR- 
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DNS method can be used to map different clients to different router nodes. This 
hybrid scheme can tolerate the load imbalance achieved using RR-DNS because 
the corresponding router will route any burst of requests that were mapped 
by the RR-DNS to the same router to different server nodes. It achieves good 
scalability because (i) a long TTL can be used so that the node running the 
RR-DNS does not become a bottleneck, and (ii) several router nodes can be 
used achieving scaling beyond that of a single router. 

One characteristic of many load balancing algorithms is that they spread 
requests from the same client across the Web server nodes; while this is often 
desirable, some applications need requests routed to specific servers. To support 
such applications, ND allows requests to be routed with an affinity towards spe- 
cific servers. An example is the manner in which ND handles requests encrypted 
using SSL (Secure Sockets Layer). SSL generates a session key which is used 
for encrypting information passed between a client and server. Session keys are 
expensive to generate. In order to avoid regenerating a session key for every 
SSL request, session keys typically have a lifetime of about 100 seconds. After a 
client and server have established a session key, all requests within the session 
key lifetime between the specific client and server will use the same session key. 

In a system with multiple Web servers, however, one Web server will not 
know about session keys generated by another Web server. If a simple load 
balancing scheme like round-robin is used, there is a high probability that two 
SSL requests from the same client within the lifetime of a session key will be 
sent to different servers resulting in unnecessary generation of session keys. ND 
avoids this problem by routing two SSL requests received from the same client 
within 100 seconds of each other to the same server. 

2.2 Geographically Distributed Web Servers 

In addition to cluster of nodes at a single site, geographically distributed Web 
servers supporting a single Web site provide for higher availability in the face 
of catastrophic failures at a location, and also can provide better response time 
by routing client requests to the lowest latency site for that client. There are 
a number of methods for providing load balancing of client requests among 
geographically distributed Web sites including: (i) Manual; (ii) DNS-based; (iii) 
Open Shortest Path First (OSPF)-based; (iv) Geographical Dispatching. 

(i) Manual (naive) load balancing: This is the obvious method, where the client 
is given a choice of location, often based on a country selection, or geographical 
area. The obvious advantage is the simplicity of implementation. The drawbacks 
include that it places the burden on the client, and the load across the servers 
is not system controlled, leading to poor load balancing. 

(ii) DNS-based: Geographical load balancing based on extending the basic Do- 
main Name Server techniques are gaining increasing popularity. As discussed, 
in Section 2.1, DNS can be used to map incoming name resolution requests for 
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a Web server name to different IP addresses, which in the geographically dis- 
tributed case is for the (single) IP address of each site. Each site, in turn, could 
be hosted on multiple computers, as discussed in Section 2.1. 

Various algorithms can be used by the extended DNS to map different clients 
to different sites. The RR-DNS, outlined in Section 2.1, is the simplest, and maps 
clients to sites in a round-robin manner. One drawback is that RR-DNS does 
not take into account the current load on the sites, or the proximity of the 
client to servers. Several other DNS site selection algorithms have been devised. 
In one technique, the sites send load information to the DNS periodically; the 
DNS maps incoming requests to the sites based on last known load, such as 
the least loaded site, or round-robin among sites below a threshold load. The 
DNS could also add an estimated load for all mapping requests since the last 
known load, so as to attempt to prevent overloading the site(s) which was/were 
previously lightly loaded till the next load information is received. As explained 
in Section 2.1, the DNS technique is moderate at best in balancing the load, 
because of caching in name servers and at the client. Nevertheless, site-level load 
balancing using these and other similar techniques works reasonably well [2]. 

In another extended DNS technique, some source (i.e. requestor) IP addresses 
can be associated with certain geographies, such as country of origin; in this case, 
the DNS can map the request to the nearest geographically located sites. How- 
ever, several IP address origins, such as those from multi-national companies, 
cannot be identified by country or geographical proximity. For such cases, an- 
other known technique is referred to as WOMbat [9]. In this technique, either 
the DNS and/or another set of sites ping the source IP address, and the site to 
serve the request is based on the response times to the pings measured. Measur- 
ing the “ping triangulation” delays is not possible for each request, because of 
the overheads and delays this incurs. Tables can be maintained for the best site 
to serve certain sets of source IP addresses. 

Combinations of the above techniques can also be used. For example, the 
closest site if it can be identified as such can be selected unless the load at this 
site is above a threshold; if so, the next closest, or other site can be selected. 

(in) Open Shortest Path First (OSPF)-based: OSPF is a routing protocol sup- 
ported by (most) routers on the Internet. OSPF determines the lowest cost route 
to a destination IP address from a specific router. In using OSPF to balance load 
across multiple Web sites, the Web servers are in a single subnet, and each ad- 
vertises the same IP address. Thus, the routers near the requestor route the 
request along the lowest cost path to a selected Web server node. 

This technique has been used for a number of IBM sports sites, including the 
Olympic Games Web site, to balance the load among multiple Web sites. 

(iv) Geographical TCP routing: One of the above techniques can be used to 
achieve coarse load balancing across a set of geographically distributed sites 
used to support a single Web site. However, once a Web client is directed to a 
Web site in this manner, especially with the DNS or manual techniques, the Web 
server loses control, and the clients, or many clients behind a gateway can make 
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requests to one site, potentially overloading it. In such a case, a TCP router at 
that site can detect the overload situation, and re-route Web requests to one of 
the other sites. We refer to this as geographical TCP routing; this technique is 
supported by IBM’s Network Dispatcher. 

The TCP router at each site periodically get load information from the dis- 
patchers at the other sites. When the load at one site is above a threshold, and 
the load at a set of other sites is below a threshold, incoming Web requests are 
re-directed to the other sites. This redirection can use various algorithms, such 
as round-robin among other qualifying sites or weighted round robin based on 
load or other criteria. 

A combination of these techniques, using OSPF-based or DNS-based tech- 
niques for coarse-grained load balancing, and geographical TCP routing for fine- 
grained load balancing, works well in practice. 



3 Web Server Accelerators 

The performance of Web servers is limited by several factors. In satisfying a re- 
quest, the requested data is often copied several times across layers of software, 
for example between the file system and the application and again during trans- 
mission to the operating system kernel, and often again at the device driver level. 
Other overheads, such as operating system scheduler and interrupt processing, 
can add further inefficiencies. One technique for improving the performance of 
Web sites is to cache data at the site so that frequently requested pages are 
served from a cache which has significantly less overhead than a Web server. 
Such caches are known as httpd accelerators [6] or Web server accelerators. 

We have developed a Web server accelerator [13] which runs under an embed- 
ded operating system and can serve up to 5000 pages/second from its cache on a 
uniprocessor 200 MHz PowerPC. This throughput is up to an order of magnitude 
higher than that which would typically be achieved by a high-performance Web 
server running on similar hardware under a conventional operating system. The 
superior performance of our system results largely from the embedded operating 
system, by optimizing the TCP communications stack, and by largely eliminat- 
ing scheduler and interrupt processing. Buffer copying is kept to a minimum. The 
operating system is unsuitable for implementing general-purpose software appli- 
cations (like database applications or on-line transaction processing) because of 
its limited functionality. However, it is well-suited to specialized network ap- 
plications such as Web server acceleration because of its optimized support for 
communications . 

In order to maximize hit rates and maintain updated caches, our accelerator 
provides an API which allows application programs to explicitly add, delete, and 
update cached data. Consequently, we allow dynamic Web pages to be cached as 
well as static ones, since applications can explicitly invalidate any page whenever 
the page becomes obsolete. Caching of dynamic Web pages is important for 
improving the performance of many Web sites containing significant dynamic 
content. 
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As illustrated in Figure 2, the accelerator can be placed in front of a set 
of Web server nodes. A TCP router runs on the same node as the accelerator 
(although it could also run on a separate node). If the requested page is contained 
in the cache, the page is returned to the client. Otherwise, the TCP router selects 
a Web server node to service the request, and the request is sent to the selected 
Web server node. Our accelerator can significantly reduce the number of Web 
servers needed at a Web site since a large fraction of the Web requests can be 
handled by the accelerator cache. A nice feature of our accelerator is that it can 
be used in conjunction with any server platform; special support in the operating 
system for serving platforms is not required. 

There are a number of Web server accelerators which are implemented either 
in network routers, or as kernel-mode caches on the serving platform. Kernel- 
mode accelerators generally require special operating system support. Examples 
are IBM’s Adaptive Fast Path Architecture (AFPA) cache, Microsoft’s Scalable 
Web Cache (SWC) [15], and kHTTPd for Linux (http://www.fenrus.demon.nl/). 
Novell sells an httpd accelerator as part of its BorderManager product [12]. 

Web server accelerators can also be geographically distributed. In this case, 
they differ from proxy caches in that they cache content for specific sites, rather 
than caching data from all Web servers. IBM has used geographically distributed 
Web server accelerators for hosting a number of highly accessed sports sites. 

Another example of a distributed Web server accelerator is the service offered 
by Akamai (www.akamai.com). Akamai has a large number of distributed Web 
caches, on the order of a few thousand. Web servers utilize Akamai’s caching 
service by running a utility at the Web server which modifies the URLs of em- 
bedded objects in pages to point to Akamai’s caches. The base Web page is 
fetched from the server, while the embedded objects, such as images, are ob- 
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tained from the Akamai caches. Akamai uses a DNS-based scheme to distribute 
requests for cached objects among their caches. 

Web server acceleration services are also provided by other providers, such 
as Digital Island (http://www.digisle.net/), Mirror Image Internet 
(http://www.mirror-image.com/), and epicRealm (www.epicrealm.com). 

4 Efficiently Serving Dynamic Data 

Web servers provide two types of data: static data from files stored at a server 
and dynamic data which are constructed by programs that execute at the time 
a request is made. Dynamic pages can seriously reduce Web server performance. 
High-performance Web servers can typically deliver several hundred static files 
per second. By contrast, the rate at which dynamic pages are delivered is often 
orders of magnitude slower; it is not uncommon for a program to consume over 
a second of CPU time in order to generate a single dynamic page. For Web sites 
with a high proportion of dynamic pages, the performance bottleneck is often 
the CPU overhead associated with generating dynamic pages. 

Dynamic pages are essential at Web sites which provide data that change 
frequently. If pages are generated dynamically by a server program, the server 
program can return the most recent version of the data. If, on the other hand, 
the data are stored in files and served from a file system, it may not be feasible 
to keep the files current. This is particularly true if the number of files which 
need to be updated frequently is large. 

One technique for improving performance of dynamic data is to cache dy- 
namic pages the first time they are created. That way, subsequent requests for 
the same dynamic page can access the page from a cache instead of repeatedly 
invoking a program to generate the same page. This was a key technique for 
improving performance at the Web sites for the 1996 and 1998 Olympic Games 
Web sites [11,3]. These Web sites served significant amounts of dynamic data, 
and caching was a critical component in reducing the amount of hardware need 
for the Web sites. 

In order to keep the cached data consistent, the server should explicitly man- 
age the cache contents instead of relying on expiration times. Algorithms for 
keeping cached dynamic data consistent are described in [4] . 

Dynamic data cannot always be cached. Some requests cause updates to oc- 
cur at the server and thus must invoke a server program. If a Web site is gener- 
ating pages which are personalized to individual clients, specific pages might not 
be accessed by multiple clients which makes caching ineffective. A technique we 
have developed to reduce dynamic overhead in this situation is to generate Web 
pages from fragments. Personalized parts of the page can be confined to specific 
fragments. In order to generate a personalized page, a personalized fragment 
is added to a template containing the rest of the page, a process which incurs 
significantly less overhead than regenerating the entire page from scratch [5] . 

Many Web sites create dynamic data from databases. Inefficient implemen- 
tations will make a new connection to the database for each access. Connecting 
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to a database can incur significant CPU overhead. It is more efficient to main- 
tain open connections to the database via long-running processes so that new 
connections are not required for each access [14]. 

The next section discusses techniques for getting more out of servers. We 
examine one aspect of what distinguishes a static page from a dynamic one and 
try and determine if pages once thought uncachable might, in fact, be cachable. 
We discuss the use of database and application triggers to automatically generate 
HTML pages. Some of the consequences of the use of server-side processing and 
inefficient HTML are demonstrated. Finally we discuss pregeneration of HTML 
pages in more detail, with an eye toward separating server functions from page 
generation functions to allow better tuning of servers. 



4.1 Myths about the Cachability of Dynamic Data 

There has long been a misconception that dynamic pages are not cachable. It is 
true that some pages must be recreated on each fetch of the page. However, a 
great deal of data commonly thought of as dynamic is, in fact, highly cachable. 

The difficulty with managing changing pages is controlling cache coherence. 
If a page changes at intervals close to, or greater than the natural lifetime of 
objects in caches, it is easier to manage. If the page changes more frequently, 
cache management becomes more difficult. But if frequently changing pages can 
be cached and updated properly, the savings can be considerable. For example 
pages with current scores of sporting events are often requested hundreds of 
times a second while the event is in progress, making it quite profitable to invest 
in the overhead of managing short lifetimes. 

While an argument can be made that, given sufficient disk space and com- 
puting power, nearly every page is cachable, the cost of caching some pages can 
exceed the cost of generating the page on each request. Such pages can validly 
be considered non-cachable. For instance, attempting to cache airline reservation 
information is difficult because the data changes too frequently and the number 
of pages affected is very large. Current stock quotes are another example where 
the update rate may be significantly higher than the request rate. 

The choice of where and when an object should be cached can also influ- 
ence server design. Private data such as bank balances might be cachable in 
a well-controlled web server accelerator but not in public proxy caches. Server 
performance can be improved this way, but traffic at the proxy remains un- 
changed. Highly public data, however, can benefit from proxy caching because 
the more general appeal of such data increases the frequency of access. The na- 
ture of the data is also a consideration. Some data can tolerably be somewhat 
out of date, significantly increasing its cachability. Stock quotes, for instance, 
are often served with 20-minute delays and disclaimers regarding their use for 
buy /sell decisions. This relaxation of requirements could well be sufficient to 
permit successful caching in Web accelerators and even in proxies. 

Consider a bank account where the bank’s server has a Web server accel- 
erator that is managed directly by the server itself. When the balance is first 
requested, the server places a copy of the returned page in its accelerator cache 



Engineering Highly Accessed Web Sites for Performance 257 



and notes that the user is active. This also causes a trigger to be set in the 
database that results in the the page being regenerated and recached if any 
change occurs. After a period of inactivity, or if the user explicitly logs out, the 
trigger and cached copies are then removed. This scheme can be optimized if 
a login process is required to establish a session. Part of session initialization 
would include prefetching commonly accessed information such as the current 
account summary and establishing appropriate triggers. The key idea here is 
that users often view shopping carts or account summaries many times before 
making a change that affects the underlying data. This technique can be used 
for many applications traditionally considered uncachable such as shopping carts 
and online banking. 

Pages whose content can be classified as news are potentially quite cachable 
regardless of the update rate, because a great deal is known about both the 
update rate and the request rate. In general, news classified as current is the news 
most people fetch: this morning’s headlines, the current sports scores, today’s 
Doonesbury. News servers can therefore benefit greatly by forcing pages into 
cache as soon as they are generated, and without waiting for an explicit request. 
Cache misses against such pages can be very costly, because of their high request 
rate. When such a page is allowed to expire from cache, each subsequent request 
generally requires a new fetch from the server, thus causing disruptive request 
spikes at the server. Some proxies and web accelerators do not queue subsequent 
requests for a page if a fetch for the page is already pending to the host because 
of the difficulty of error recovery. Rather, if the requested object is not in cache, 
the request is forwarded to the host regardless of whether a similar request is 
already in progress. This is not a problem for infrequently accessed pages, but 
for popular pages, this can cause a great number of redundant host accesses until 
the object finally arrives in cache. This entire problem can be sidestepped for 
pages known to be popular by simply prefetching as soon as they are available. 

Ideally, caches permit authenticated applications to directly manipulate their 
contents by adding, removing, and updating items. Unfortunately there is no 
standardization of cache protocols and APIs among vendors, so few commercial 
caches provide this ability yet. Older protocols such as ICP were designed to solve 
cache-to-cache communication problems and do not provide sufficient function 
for host-to-cache control. 

Some caches allow authorized servers to fetch items via specially configured 
addresses. These addresses can be configured to bypass the cache for the fetch, 
but to add/replace the item in the cache when it arrives, thus providing a sort 
of “proxy-client” update mechanism. Unfortunately there is no comparable way 
to delete items, so if explicit object deletion is required rather than object re- 
placement, a different mechanism such as time-based expiration is required. 

4.2 Overuse of HTTP Services 

Another significant cause of overhead on http servers is the use of server ex- 
tensions. These extensions have become very popular and come in many forms: 
server side includes (SSI), Java Servlets, Java Server Pages (JSP), the mod_perl 
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extensions to Apache, Microsoft’s Active Server Pages (ASP), as well as more 
low-level interfaces such as Netscape’s NSAPI and Apache’s modules. The Com- 
mon Gateway Interface (CGI) is still used but due to its very high overhead, is 
being replaced by the other, more efficient, approaches. 

These extensions can provide extremely useful services, particularly when a 
large number of similar pages can be generated from a site. For example, Mi- 
crosoft’s Terraserver (http://terraserver.microsoft.com) uses ASP to implement 
point-and-click navigation over aerial photographs of most of the continental 
United States. 

However, these extensions use cycles that may be better consumed serving 
pages. One common example is the “factoid”, or “thought of the day”, a short 
snippet of trivia or wisdom chosen at random and appended to each page of 
a site. These are usually included by some form of server-side parser such as 
SSI. The cost of doing this is very large: the server must parse every page as it 
serves it. However, we have found that statically choosing a random factoid only 
at initial page generation time produces significant savings at the server. Many 
users don’t even notice the factoid, and those that do don’t generally reload 
the page repeatedly just to see how it changes. If the site has any elements of 
change (all news sites, for example), the factoid tends to change fairly frequently 
anyway. 

The per-page overhead of simple features such as factoids may seem small. 
However, the server has to serve those pages over and over, and at very high 
rates on a busy site. The accumulation of small overheads such as serve-time 
factoids can be significant. 

Suppose a server extension is used to add a factoid to a page. Let us assume 
that the average number of bytes of HTML per page is 10,000 bytes. This could 
easily require an extra 20,000 instructions to parse the page, find the SSI direc- 
tive, fetch the factoid text, insert it into the page, and finally be ready to serve 
the page. The IBM sports sites are starting to experience extended peak request 
rates approaching 1,000,000 pages per minute. These peak rates are no longer the 
spikes seen in the early days of the web, but are extended, flat peaks lasting as 
long as several hours. To insert a factoid at serve time, given these assumptions, 
requires additional processing power sufficient to serve (1,000,000/60) * 20,000 
= 333,333,333 additional instructions per second. It is for this reason that the 
IBM sports servers have switched to more static factoids, generated only once 
at page composition time, rather than at page serve time. 

Inefficient HTML is another significant performance drain. Excessive use of 
scripting, deeply nested tables, long hypertext references, and embedded com- 
ments and blanks can significantly increase the size of a page. One site recently 
analyzed has over 6,000 bytes of Javascript served on every page, whether the 
script is needed or not, as a result of the design of the common headers and foot- 
ers for each page. Suppose one of the sports sites mentioned above was serving 
these pages. During peak periods of 1,000,000 pages per minute the site would 
have to serve an additional (1,000,000 / 60) * 6,000 = 100,000,000 bytes per 
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second, requiring very substantial increase in network infrastructure as well as 
additional CPU horse power to feed the network. 

4.3 Pre-generation of Pages 

Disk space is inexpensive these days. Several 40 GB disks are significantly less 
expensive than a new server. If a site is heavily visited, pre-generating all pages 
and serving them as flat files can result in significant savings. This is simple when 
the database is static or mostly static. If the database changes often, it may still 
be easy to “flatten” the pages. Industrial-strength databases usually implement 
the concept of database triggers. Through the use of database triggers, one can 
easily associate HTML page generation with database updates. This technique 
has allowed the IBM Olympic Games and Sports sites to actually decrease the 
amount of hardware, despite extremely high growth of traffic. Techniques for 
doing this are described in [4]. 

One problem with database triggers is that most real-world events cause mul- 
tiple database updates. Triggering on each database update results in expensive 
redundant triggers that can be difficult to manage. If the transaction has to 
be rolled back before it is complete, some triggers would have been erroneously 
delivered, resulting in the generation of inconsistent pages. This problem can 
be solved with the use of a “commit table”. The database loader is modified 
so that after committing the actual transaction updates, a single update to the 
commit table is made summarizing the transaction. The database trigger is now 
attached only to the commit table. This technique solves the problem of rollback, 
and reduces the number of trigger activations to a minimum. 

Database triggers are not always available or practical to use. In these cases, 
simple log-following programs that parse logs produced by database loaders or 
content generating programs can be used in lieu of database triggers. If the 
publishing tools have APIs, appropriate hooks can sometimes be inserted into 
the tools to trigger page generation. 

These examples illustrate the idea of separating page serving from page com- 
position. Web servers generate a unique load on the system. Page composition 
creates a significantly different type of load. It can be difficult to tune a sys- 
tem that performs both tasks well, and it can be even more difficult to tune 
the system to do both tasks well in the same process (such as a threaded http 
server). By partitioning two or more systems so that some are dedicated solely 
to the task of generating pages, and dedicating the rest to serving, systems can 
be tuned for maximum throughput. 



5 Other Factors Affecting Performance 

Web page design can have a significant impact on performance. Web pages should 
be designed to convey useful information in a limited number of pages so that 
clients don’t navigate through too many pages to obtain the information that 
they are looking for. For example, Web page design for the 1998 Olympic Games 
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Web site was considerably improved over the design for the 1996 Olympic Games 
Web site in order to reduce the number of intermediate pages viewed for accessing 
information. Figure 3 shows a home page from the 1996 Olympic Games Web 
site. The page doesn’t contain much useful information. Glients must navigate to 
other pages in order to obtain the information they are looking for. By contrast, 
the home page for the 1998 Olympic Games Web site shown in Figure 4 contains 
important information on it. Fewer pages must be navigated through in order to 
obtain useful information. We estimate that moving from the 1996 page design 
to the 1998 page design may have decreased hits by more than a factor of two. 

In order to reduce the number of hits to a Web site, static content can be 
cached remotely in proxy caches. A file cached at a remote proxy can have an 
expiration time associated with it indicating when the object should no longer 
be served. One problem with caching content in proxy caches is that standard 
protocols don’t allow a server to pre-emptively contact proxy caches in order 
to notify the caches that a file has changed. Since expiration times are often 
difficult to guess precisely, a proxy cache may continue to serve stale data if 
an object changes before its assigned expiration time. Alternatively, if a server 
assigns expiration times conservatively so that objects usually change long after 
their assigned expiration times have expired, the server is likely to receive more 
requests for updates from proxy caches than are needed. 

Encrypting Web pages via SSL can consume significant GPU cycles. In order 
to reduce overhead, only essential information should be encrypted. A mistake 
sights will often make is to encrypt not only text containing confidential in- 
formation but embedded images as well. This can increase the GPU overhead 
significantly since a Web page might contain several embedded images, and per- 
haps none of them contain confidential information. 

Although data encrypted via SSL is generally not cached within the network, 
such data can be cached in browsers if an expiration time is provided. Many sites 
have common navigation bars, buttons, logos, etc. which are used on several 
different pages. If these entities need to be encrypted, expiration times should 
be included in order to allow them to be cached in browsers. This will reduce the 
number of SSL requests to the site for the cached objects. While browser caching 
of nonencrypted objects also can improve performance, caching of encrypted 
objects can reduce server load more significantly because of the high overhead 
of SSL requests. 

The organization of content and the design of Web pages can adversely affect 
performance. The HTTP protocol used to retrieve Web page content supports 
keep-alive sockets, a method allowing the client requesting content to reuse the 
same connection to the Web server for subsequent content requests once the 
initial request has been satisfied. Reusing sockets can avoid the overhead of ne- 
gotiating SSL keys and network connections. If a page is designed to draw content 
from more than one server, this can cause the client to have to perform a DNS 
lookup for the various servers referenced, close previously opened connections 
and establish new connections to other servers. If the content can be located on 
the same server, all of this overhead can be avoided. Keep-alive sockets can also 
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Fig. 3. Home page for the 1996 Olympic Games Web site. Clients must navigate 
to other Web pages in order to obtain useful information 




262 Jim Challenger et al. 




Fig. 4. Home page for the 1998 Olympic Games Web site containing significant 
useful information 
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be closed prematurely by improperly tuned servers, where the session time out 
is set too low causing the server to close the socket unnecessarily and forcing the 
client to reestablish a connection. 

Because the client (e.g., browser) is able to provide content caching, page 
designers should ensure the same content (e.g., menu GIFs) are requested using 
the same URL’s, thereby allowing the request to be satisfied locally from the 
cache. We have seen cases where the main page and search page presented the 
same images (e.g., menus, logos), but each was drawn from different locations on 
the server forcing the client to re-request the same content. Menus often comprise 
multiple images, one for each selection. Therefore, to populate the menu requires 
the client to make multiple requests to receive the individual menu images. Even 
if the content is able to be served from its cache, a request may be sent from 
the client to the server to test to see that the cached copy is still viable. By 
consolidating multiple menu images into one larger image and employing an 
image map to determine which portion of the menu bar has been selected, only 
one request is required to retrieve the menu (or validate the cached menu is 
viable), saving on network transmissions and relieving the server by reducing 
the number of requests it must satisfy. 

We have seen many Web pages comprising 20 or more components, often re- 
sulting in downloads of more than 100,000 bytes. While these pages perform well 
for the developers connected by 100MB LANs to local servers, consider the typ- 
ical customer attaching remotely through the Internet via a 28,800 modem and 
able to retrieve less than 4000 bytes per second. Often, changing the size or num- 
ber of colors employed by images can have dramatic impact on improving Web 
page performance. By paying more attention to the ink to information ratios, 
one can often improve performance and simplify the presentation of information 
without sacrificing its beauty. 

Designing Web pages to be effective is an art requiring the authors to strike 
a balance between information, aesthetics and performance considerations. The 
application or purpose of the Web page being served plays a large part in select- 
ing the “best” design for a Web page. For example, the introductory page for 
a Web site provides the jumping off point to access the sites features. Is there 
really a need to have more information than can be viewed without scrolling? If 
not carefully coded, retrieval of information unable to be viewed in the visible 
portion of the browser can cause the page to be held pending retrieval of size 
and placement information before the browser can render the page. This leaves 
customers staring at a blank or partially formed page while they wait to use 
the Web page. Tags labeling page components with their dimensions allow the 
browser to begin page rendering as other components are retrieved. By using 
text in combination with graphic buttons, a page can be made navigable before 
all the images are retrieved. Even in cases where lots of information needs to be 
retrieved for a Web page, the design can make the Web page useful before all 
information is retrieved. Consider auction sites that are effective by presenting 
the most commonly sought information (e.g., current bid, auction close, brief 
item description, seller) in the visible portion of the browser while the images of 
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the items being auctioned and supporting bid histories are retrieved below the 
visible portion. By presenting customers with the information they seek while 
the remainder is retrieved, the customer switches from waiting to comprehending 
(effectively allowing use of think time to retrieve the supporting information). 
Though it still takes time to completely retrieve the page, the customers will 
not perceive poor performance because they can begin using the page soon after 
making their retrieval requests. 

When engineering highly accessed Web sites for performance, it is important 
to understand the end-to-end dynamics of Web page retrievals. Though compo- 
nents are often able to be retrieved concurrently, the schedule by which they are 
retrieved and the ability of the receiving application to render the content plays 
a large part in customer-perceived performance. 
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Abstract. In this work, we outline more than a hundred characteristics 
and attributes for the domain of academic sites in order to analyze the 
quality requirement tree and a way to specify them. These elements are 
used in a quantitative methodology for assessment, comparison, and 
ranking processes. The proposed Web-site Quality Evaluation 
Methodology (QEM) can be a useful approach to assess the quality in 
different phases of a Web product life cycle. In the academic study, we 
have observed three different evaluation audiences regarding visitor 
users: current and prospective students, academic personnel, and 
research sponsors. Besides, the aim of this work is to show a 
hierarchical and descriptive specification framework for characteristics, 
subcharacteristics and attributes. This framework is a key underlying 
piece in the construction of a hyperdocumented evaluation tool. Finally, 
some results are presented and concluding remarks are discussed. 



1 Introduction 

The age of Web artifacts for domains as academic, museum, and electronic commerce 
applications range on an average from three years to the latter, to five years to the 
former. In addition, existing sites in these domains are not just document oriented but 
also application oriented and, as a well-known consequence, they are increasingly 
becoming complex systems. Flence, in order to understand, control, and improve the 
quality of Web applications we should unavoidably use Web engineering methods, 
models, and techniques [1]. In this direction, we have proposed the use of Web-site 
QEM [12, 14] as a systematic and quantitative approach in order to assess the product 
quality in the different phases of a Web lifecycle. The core model and procedures for 
the aggregation of characteristics and attributes are based on a multi-attribute multi- 
criteria scoring approach [2]. 

Traditionally, evaluation methods and techniques have been categorized in 
qualitative and quantitative. Even though software assessment has more than three 
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decades as a discipline [4, 7, 8], the systematic and quantitative quality evaluation of 
Hypermedia applications and, particularly, the evaluation of Web applications is 
rather a recent and frequently neglected issue. In the last four years, quantitative 
surveys and domain- specific descriptive evaluations have emerged [6, 9]. However, in 
this direction, we need a flexible, engineering-based methodology and tools to assist 
evaluators in the assessment process of Web quality requirements somewhat 
complexes. 

Specifically, when using the Web-site QEM methodology we take into account a 
set of phases and activities. The main technical process steps can be summarized as 
follows: (a) selection of an evaluation domain; (b) determination of assessment goals 
and the user standpoint; (c) definition and specification of quality requirements; (d) 
definition and implementation of the elementary evaluation; (f) definition and 
implementation of the partial/global evaluation; and (g) analyses of outcomes and 
recommendations . 

In order to illustrate aspects of steps (c) and (d), we include some models and 
results of the case study about academic sites [14]. With regard to the selected quality 
characteristics and attributes for assessment purposes, about eighty direct or indirect 
attributes were found in the process. We grouped and categorized website 
subcharacteristics and attributes starting from six standard characteristics [5], which 
describe with minimal overlap, software quality requirements. As stated in this 
standard, software quality may be evaluated in general by the following 
characteristics: usability, functionality, reliability, efficiency, portability, and 
maintainability. These high-level characteristics provide a conceptual foundation (or 
model) for further refinement and description of quality. However, the relative 
importance of each characteristic and attribute in the quality requirement tree, varies 
depending on the user standpoint and application domain considered. The above ISO 
standard, defines also three views of quality: users, developers, and managers views. 
Specifically, in the academic domain there are three general audiences regarding the 
user (visitor) view, namely: current and prospective students (and visitors like 
parents), academic personnel such as researchers and professors, and research 
sponsors [11]. Thus, visitors are mainly concerned in using the site, that is to say, its 
specific user-oriented content and functionality, its searching and browsing functions, 
its feedback and aesthetic features, its reliability, its performance, and ultimately, are 
interested in its quality in use. However, maintainability and portability are not 
visitors (end users) concerns. (Some student-oriented questionnaires were conducted 
to help us in determining the relative importance of characteristics, subcharacteristics, 
and attributes). 

The final aim of the Web academic case study, was to assess the level of 
accomplishment of required characteristic such as usability, functionality, reliability, 
and efficiency comparing partial and global preferences. This allowed us to analyze 
and draw conclusions about the state-of-the-art of academic sites quality, from the 
current and prospective student’s point of view. 

The structure of this paper is as follows: In Section 2, general indications and 
assumptions for the academic study are made. In Section 3, we specify intervening 
quality characteristics and attributes. A hierarchical and descriptive specification 
framework are discussed in Section 4; in addition, some characteristics and attributes 
are modeled regarding such framework. Finally, some partial outcomes are analyzed, 
and concluding remarks are considered. 
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2 Some Considerations for the Academic Study 



One of the primary goals for this assessment was to understand “the current level of 
fulfillment of quality characteristics given a set of requirements regarding the 
prospective and current student view” [14]. 

In order to prepare the study, we selected six academic operational sites aging four 
years on an average, at the time of the evaluation. On one hand, the chosen sites 
should be typical and well-known academic organizations. (The study included the 
following sites: the Stanford University (US, http://www.stanford.edu), the Chile 
University (http://www.uchile.cl), the National University of Singapore 
(http://www.nus.sg), the University Technological of Sydney (Australia, 
http://www.uts.edu.au), the Polytechnic University of Catalonia (Spain, 
http://www.upc.es), and the University of Quebec at Montreal (Canada, 
http://www.uqam.ca)). 

Fig. 1, illustrates a screenshot of a home page and highlights some attributes. There 
are a lot of such attributes both general and domain specific that contribute to Web 
quality, so designers should take them into account when building for intended 
audiences (in the next section, we see the whole list of intervening attributes and 
characteristics). 
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oriented 
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1.1.4 
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1 . 1 . 1.2 
Table of 
Contents 



1.1. 1.3 
Alphabetic- 
al Index 



1.4.2 

What ’s New 
Feature 



2 . 1 . 1.2 
Global 
Search 



Fig. 1. A partial view of the UTS’s home page where some attributes are highlighted 



On the other hand, the data collection process can be done manually, semi- 
automatically, or automatically. Most of the attributes values were collected manually 
because there was no way to do it otherwise. However, automatic data collection is in 
many cases the most reliable and almost unique mechanism to collect data for given 
attributes. This was the case to measure the Dangling Links, Orphan Pages, Image 
Title, and Quick Pages attributes, among others. Finally, at the time of data collection 
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(which began on January 22, and finished on February 22, 1999), we did not perceive 
changes in these Web sites that could have affected the evaluation process. 



3 Outlining the Quality Requirement Tree 



In this section, we outline over a hundred and twenty quality characteristics and 
attributes for the academic site domain. Among them, about eighty were directly or 
indirectly measurable. The primary goal is to classify and group in a requirement tree 
the elements that might be part of a quantitative evaluation, comparison, and ranking 
process. As previously said, to follow a well-known standard [5], we use the same 
high-level quality characteristics as usability, functionality, reliability, and efficiency. 
These characteristics give evaluators a quality model and provide a baseline for 
further decomposition. A quality characteristic can be decomposed in multiple levels 
of subcharacteristics, and in turn, a subcharacteristic can be refined in a set of 
measurable attributes. 

In order to effectively select quality characteristics and attributes for evaluation 
purposes, we should consider specific kind of users [13]. Specifically, in the academic 
domain, there are three different audiences regarding the visitor standpoint as studied 
elsewhere [11]. (This audience-oriented division was clearly established in the 
structure of the UTS site -see e.g., the table of contents in the given URL). 



1. Usability 

1.1 Global Site Understandability 

1.1.1 Global Organization Scheme 

1.1. 1.1 SiteMap 

1 . 1 . 1 .2 Table of Contents 

1 . 1 . 1 .3 Alphabetical Index 

1.1.2 Quality of Labeling System 

1.1.3 Student-oriented Guided Tour 

1.1.4 Image Map (Campus/Buildings) 

1.2 Feedback and Help Features 

1.2.1 Quality of Help Features 

1.2. 1.1 Student-oriented Explanatory 
Help 

1 .2. 1 .2 Search Help 

1.2.2 Web-site Last Update Indicator 

1. 2.2.1 Global 

1 .2.2.2 Scoped (per sub-site or page) 

1.2.3 Addresses Directory 

1.2. 3.1 E-mail Directory 

1 .2.3.2 Phone-Fax Directory 

1 .2.3.3 Post mail Directory 

1 .2.4 FAQ Feature 

1.2.5 Form-based Feedback 

1.2. 5.1 Questionnaire Feature 

1.2. 5.2 Guest Book 

1.2. 5. 3 Comments 

1.3 Interface and Aesthetic Features 



1.3.1 Cohesiveness by Grouping Main 
Control Objects 

1.3.2 Presentation Permanence and Stability of 
Main Controls 

1.3. 2.1 Direct Controls Permanence 

1 .3.2.2 Indirect Controls Permanence 

1.3.2. 3 Stability 

1.3.3 Style Issues 

1.3. 3.1 Link Color Style Uniformity 

1.3. 3.2 Global Style Uniformity 

1.3.3 .3 Global Style Guide 
1.3.4 Aesthetic Preference 

1.4 Miscellaneous Features 

1.4.1 Foreign Language Support 
(multilingual site) 

1.4.2 What’s New Feature 

1 .4.3 Screen Resolution Indicator 
2. Functionality 

2.1 Searching and Retrieving Issues 

2.1.1 Web-site Search Mechanisms 

2. 1 . 1 . 1 Scoped Search 

2 . 1 . 1 . 1 . 1 People Search 

2. 1 . 1 . 1 .2 Course Search 

2. 1 . 1 . 1 .3 Academic Unit Search 
2. 1 . 1 .2 Global Search 

2.1.2 Retrieve Mechanisms 

2. 1.2.1 Level of Retrieving Customization 

2. 1 .2.2 Level of Retrieving Feedback 
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2.2 Navigation and Browsing Issues 

2.2.1 Navigability 

2. 2. 1.1 Orientation 

2. 2. 1.1.1 Indicator of Path 

2.2. 1 . 1 .2 Label of Current Position 

2.2. 1 .2 Average of Links per Page 

2.2.2 Navigational Control Objects 

2.2.2. 1 Presentation Permanence and 
Stability of Contextual (sub-site) Controls 

2. 2.2. 1.1 Contextual Controls 
Permanence 

2. 2.2. 1.2 Contextual Controls Stability 

2. 2.2.2 Level of Scrolling 

2. 2.2.2. 1 Vertical Scrolling 

2.2.22.2 Horizontal Scrolling 

2.2.3 Navigational Prediction 

2.2.3. 1 Link Title 

2.2.3 .2 Quality of Link Phrase 

2.3 Domain-related Student Features 

2.3.1 Content Relevancy 

2. 3. 1.1 Academic Unit Information 

2. 3. 1.1.1 Academic Unit Index 

2.3 .1 .1 .2 Academic Unit subsite 

2.3. 1.2 Enrollment Information 

2.3. 1.2. 1 Entry Requirement Information 

2.3. 1.2.2 Form Fill/Download 

2.3. 1.3 Degree Information 

2.3. 1.3.1 Degree Index 

2.3. 1.3.2 Degree Description 

2.3. 1.3.3 Degree Plan/Course Offering 

2.3. 1.3.4 Course Description 

2.3. 1.3.4. 1 Comments 
2. 3. 1.3. A. 2 Syllabus 

2.3. 1.3. 4.3 Scheduling 

2.3. 1.4 Student Services Information 

2.3. 1.4.1 Services Index 

2.3. 1.4.2 Healthcare Information 

2.3. 1.4.3 Scholarship Information 

2.3. 1.4.4 Housing Information 

2.3. 1.4.5 Cultural/Sport Information 



2.3. 1.5 Academic Infrastructure Information 

2.3. 1.5.1 Library Information 

2.3. 1.5.2 Laboratory Information 

2.3. 1.5.3 Research Results Information 

2.3.2 On-line Services 

2.3.2. 1 Grade/Fees on-line Information 

2. 3.2.2 Web Service 

2.32.3 FTP Service 

2. 3.2.4 News Group Service 

3. Site Reliability 

3.1 Non-deficiency 

3.1.1 Link Errors 

3 . 1 . 1 . 1 Dangling Links 

3 . 1 . 1 .2 Invalid Links 

3. 1.1. 3 Unimplemented Links 

3.1.2 Miscellaneous Errors or Drawbacks 

3. 1.2.1 Deficiencies or absent features due to 
different browsers 

3. 1.2.2 Deficiencies or unexpected results 
(e.g. non-trapped search errors, frame 
problems, etc.) independent of browsers 

3.1 .2.3 Orphan Pages 

3. 1.2.4 Destination Nodes (unexpectedly) 
under Construction 

3.1.3 Spelling Errors 

4. Efficiency 

4.1 Performance 

4.1.1 Quick Pages 

4.2 Accessibility 

4.2.1 Information Accessibility 

4.2.1 . 1 Support for text-only version 

4.2. 1.2 Readability by deactivating the Browser 
Image Feature 

4. 2. 1.2.1 Image Title 

4. 2. 1.2.2 Global Readability 

4.2.2 Window Accessibility 

4.2.2. 1 Number of panes regarding frames 

4. 2.2.2 Non-frame Version 



Fig. 2. Quality requirement tree for academic sites. The use of italic style is for attributes 

Fig. 2, outlines the major characteristics, subcharacteristics, and attributes 
regarding current and prospective students. Likewise as in the museum case study 
[12], the highest level characteristics such as maintainability and portability were not 
included in the requirements for end users. Following, we comment some 
characteristics and attributes and the decomposition mechanism. 

The Usability characteristic is decomposed in subfactors such as Global Site 
Understandability, Feedback and Help Features, Interface and Aesthetic Features, 
and Miscellaneous Features. The Functionality one is split up in Searching and 
Retrieving Issues, Navigation and Browsing Issues, and Domain- related Student 
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Features. The same decomposition mechanism is applied to Reliability and Ejficiency 
characteristics. For instance, the Efficiency characteristic is decomposed in 
Performance and Accessibility subcharacteristics. (A hierarchical and descriptive 
specification framework will be presented in the next section). 

For the Global Site Understandability subcharacteristic (within Usability), we have 
in turn split up in Global Organization Scheme subcharacteristic, and in quantifiable 
attributes such as Quality of Labeling, Student-oriented Guided Tours, and Campus 
Image Map. However, Global Organization Scheme subcharacteristic is still too 
general to be directly measurable, so attributes such as Site Map, Table of Contents, 
and Alphabetical Index are derived. 

Focusing on Domain-related Student Features subcharacteristic (within 
Functionality), we have also observed two main subcharacteristics, namely: Content 
Relevancy and On-line Services. As the reader can see, we evaluate aspects ranging 
from academic units, courses, enrollment and services information, to ftp, news 
groups, and Web publication provided to undergraduate and graduate students. 



4 A Hierarchical and Descriptive Specification Framework 

In order to document the main information yielded in different processes of the 
methodology, we follow a hierarchical and descriptive specification framework as 
shown in Fig. 3. Specific information about definition of attributes, subcharacteristics 
and characteristics as well as metrics and elementary preference criteria, scoring 
model components and partial/global criteria, and assessment results are recorded. 
The WebQEM_Tool (an ongoing evaluation tool) supports to this Web-enabled 
specification framework, and is intended to assist evaluators in the editing, calculating 
and hyperdocumenting activities of Web-site QEM. 

Let’s comment some items to the Attribute template (Fig. 3). The codes into the 
template are those of the requirement tree as seen in Fig. 2. The codes are used to 
anchor other pieces of information and also as part of primary keys in databases. The 
definition item is an empirical and qualitative statement about the attribute (in 
addition, authoritative comments or useful references can be included). The metric 
and parameters item anchors a template with information of the selected metric 
criterion, the expected and planned values for the metric, measurement dates, among 
other. The data collection type item records whether the data are gathered manually 
(observationally) or automatically, as commented in Section 2. If this process was 
made automatically, information of the utilized tool are recorded. The elementary 
criterion type item documents the specific preference criterion type and function. (The 
elementary preference for a given user view means the level of fulfillment of that 
elemental requirement -modeled by the attribute). The criterion type can be absolute 
or relative with discrete or continuos variable. For example, absolute and discrete 
variable criteria are classified in binary, multi-level, subset structured, and multi- 
variable criteria. The preference scale item links an image, which show a reduced 
graphic of the mapping between the metric value [3] and the elementary preference 
value [2]. Finally, we have the corresponding fields to record the attribute weight 
which means the relative importance of that attribute in the group. In addition, we 
record the elementary preference value for one or more assessed sites given an 
evaluation project. 
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Next, we use some fields of the specification templates to exemplify one 
characteristic, and five attributes from the Fig. 2. 



Title: Code: Tvne: Characteristic 

Subcharacteristic/s (code/sl: 

Definition / Comments: 

Model to determine the Global Comnutation: 
Emnloved Tool: 

Preference Scale: 

Arithmetic / Logic Onerator: 

Calculated Preference Value/s: 

Weight: 

Examnle/s: 


Title: Code: Tvne: Attribute 

Highest level Characteristic (code): 
Sunercharacteristic (code): 

Definition / Comments: 

Temnlate of Metric and Parameters: 

Data Collection Tvne: (Emnloved Tool:) 
Elementarv Criterion Tvne: 

Preference Scale: 

Preference Value/s: Weight: 

Examnle/s: 


Title: 


Code: 


Type: Subcharacteristic 


Sunercharacteristic Icodel: 


Subcharacteristic/s fcode/si: Attribute/s fcode/sl: 


Definition / Comments: 


Model to determine the Partial computation: 


Emnloved Tool: 


Calculated Preference Value/s: 


Weight: 


Arithmetic / Logic Operator: 


Preference Scale: 


Examnle/s: 





Fig. 3. Templates to specify highest level characteristics, subcharacteristics, and attributes 



Title : Usability; Code : 1; Type : Characteristic 

Subcharacteristic/s : Global Site Understandability (1.1), Feedback and Help Features 
(1.2), Interface and Aesthetic Features (1.3), Miscellaneous Features (1.4). 

Definition / Comments : It is a high-level product quality characteristic that can be 
calculated from the appropriate aggregation of direct and/or indirect metrics. An 
empirical statement is given by the ISO standard [5]: "A set of attributes that bear on 
the effort needed for use, and on the individual assessment of such use, by a stated or 
implied set of users". 

It is also important to cite as comments the wider definition given by ISO in the 
ISO/IEC 9126-1 draft that states: “The capability of the software product to be 
understood, learned, used and attractive to the user, when used under specified 
conditions” . 

Model to determine the Global Computation : Logic Scoring of Preference model [2]; 
Employed Tool/s : WebOEM Tool. „ 



Preference Scale : I I I I i i i 

Weight : 0.3; 60% 100% 

Logic Operator : Conjunctive operator C - - (see [14]). 

Calculated Preference Value/s : Stanford University = 71.93% of the quality 
preference for Usability; UTS = 80.08%; UQAM = 60.94%; UPC = 76.18%; UChile 
= 51.01%; and NUS = 57.71%. 

Example/s : It has been used as a constituent part of quality requirements in three 
cases studies and a survey. Besides, in two Web development projects. 



Title : Site Map; Code : 1 . 1 . 1 . 1 ; Type: Attribute 



Specifying Quality Characteristics and Attributes for Websites 273 



Highest level characteristic : Usability (1); Supercharacteristic : Global Organization 
Scheme (1.1.1). 

Definition / Comments : A site map is rather a graphical representation, which shows 
the structure or architecture (often hierarchical) of the whole website. Hence, a site 
map presents the information architecture in a way that goes beyond textual 
representation, though, allowing frequently direct navigation from its anchored 
elements likewise indexes and table of contents. 

Unfortunately, the term is often used interchangeably for a table of contents and 
sometimes for an index. As stated by Rosenfeld et al. [15], the above definition 
excludes tables of contents and indexes because of the use of graphical components to 
enhance the visualization and aesthetic appeal. 

Elementary Criterion Type: It is a binary discrete absolute criterion; where if it is 
assessed the availability of the mechanism, then: the value 0 means it is not available 
at all, the value 1 means it is available. q 1 



0% 



f-K 



60% 



Preference Scale: 

Data Collection Type: Observational 
Preference Value/s : see the 1.1. 1.1 raw, in Table 1. 

Example/s : The UPC site map (http://www.upc.es/catala/index/index.htm ). 



100% 



Title : People Search: Code : 2. 1.1. 1.1: Type : Attribute 

Highest level characteristic : Eunctionality (2); Supercharacteristic : Scoped Search 

( 2 . 1 . 1 . 1 ) 

Definition / Comments : "'Sometimes, special areas of a site are sufficiently coherent 
and distinct from the rest of the site that it makes sense to offer a scoped search that is 
restricted to only search that subsite” [10]. 

Eor instance, for a student audience can often be better to provide both scoped 
search and global search; i.e., it can be necessary a scoped search to find a course or 
person by surname or email as long as a global search can also be necessary to find 
information in the whole site. 

Elementary Criterion Type : It is a multi-level discrete absolute criterion defined as a 
subset, where: 0 = no search mechanism is available; 1 = search mechanism available 
by name/surname; 2 = 1 H- expanded search: search mechanism by academic unit 
and/or subject area or discipline, and/or phone, (or other filters). 



Preference Scale: 



0% 



f-K 



60% 



100% 



Data Collection Type : Observational 
Preference Value/s : see the 2. 1.1. 1.1 raw, in Table 1. 

Example/s : An outstanding example was the Stanford people search 

(http://sin.stanford.edu/), as illustrated in Eig. 4; the computed elementary preference 
was 100%. Other examples were at University of Chile, and at UQAM. 
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Fig. 4. The People Search and Retrieval Customization mechanisms at Stanford University 



Title : Dangling Links; Code : 3. 1.1.1; Type : Attribute 

Highest level characteristic : Reliability (3); Supercharacteristic : Link Errors (3.1.1) 
Definition/Comments : It represents found links that lead to missing destination pages 
(known also as broken links). ’’{/i'eri get irritated when they attempt to go somewhere, 
only to get their reward snatched away at the last moment by a 404 or other 
incomprehensible error message” [9] (http://www.useit.com/alertbox/980614.html ). 
Elementary Criterion Type : It is an absolute and continuous normalized-variable 
criterion, where if BL = number of broken links found, and TL = number of total links 
of the site. Thus, the formula to compute the preference is: X = 100 - (BL * 100/TL) * 
10; where, if X < 0 then X = 0; 

X„,„ 100 

Preference Scale : 

Data Collection Type : Automated. 

Employed Tool: SiteSweeper 2.0 
Preference Value/s : see the 3. 1.1.1 raw, in Table 1. 

Example/s : The National University of Singapore produced a preference of 68.06 %. 
The value was computed from the above formula: 100 - ((970*100)/30883) * 10 = 
68.06 %. 



0% 60% 100% 



Title : Orphan Pages: Code : 3. 1.2. 3: Type : Attribute 

Highest level characteristic : Reliability (3); Supercharacteristic : Miscellaneous 

Errors or Drawbacks (3.1.2) 

Definition/Comments : It represents found pages that have no return link to the home 
page or subsites within the same site (also known as dead pages). "Make sure that all 
pages include a clear indication of what Web site they belong to since users may 
access pages directly without coming in through your home page. For the same 
reason, every page should have a link up to your home page as well as some 
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indication of where they fit within the structure of your information space” [9] 
(http://www.useit.com/alertbox/9605.html ). 

Elementary Criterion Type : It is an absolute and continuous normalized-variable 
criterion, where if OP = number of orphan pages found, and TP = number of pages of 
the whole site. Thus, the formula to compute the variable is: X = 100 - (OP * 100/TP) 
* 10; where, if X < 0 then X = 0; 

Preference Scale : the same as in Broken Links. Data Collection Type : Automated. 
Note: At the time of data collection for the academic study, we did not have available 
a tool so this attribute did not intervene. After that, we have built a tool (Website 
MA), which automate this metric among twelve other metrics. 



Title : Image Title.' Code: 4. 2. 1.2.1,- Type : Attribute 

Highest level characteristic : Efficiency (4); Supercharacteristic : Readability by 
deactivating the Browser Image Feature (4. 2. 1.2) 

Definition / Comments: It should be provided alternative text for each image or 
graphic component since they convey visual information. It measures the percentage 
of <ALT> tag presence that includes replacement text for the image. This attribute 
favors the readability feature when the user disable the browser’s image feature. 
However, the measure of this attribute does not guarantee the quality of alternative 
text. Some text could be generated automatically when editing with tools like 
Frontpage, among others. 

The W3C in the WAI Accessibility Guidelines [16], http://www.w3c.org/TR/WD- 
WAI-PAGEAUTH says: “Text is considered accessible to almost all users since it 
may be handled by screen readers, non-visual browsers, Braille readers, etc. It is 
good practice, as you design a document containing non-textual information ( images, 
graphics, applets, sounds, etc.) to think about supplementing that information with 
textual equivalents wherever possible”. 

Elementary Criterion Type : It is an absolute and continuous normalized-variable 
criterion, where AAR = number of absent ALT reference (in the HTML code); TAR 
= the total number of objects (images) that should reference the ALT property. The 
formula to compute the preference is: 

X = 100 - (AAR * 100/TAR) n 100 



0% 



f+L 



60% 



100% 



Preference Scale: 

Data Collection Type: Automated. 

Employed Tool: SiteSweeper 2.0 
Preference Value/s : see the 4. 2. 1.2.1 raw, in Table 1. 

Example/s: For the UTS site, the tool gave us the absence percentage and in that case 
reported “Of the 63.882 inline references on your site that should specify an ALT 
attribute, 11.721 references (18%) are missing the attribute. The missing ALT 
attributes appear on 3.338 different pages”. The elemental preference drew 81.65%. 
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Table 1. Partial results of elementary quality preferences for the six academic sites 





UPC 


Uchile 


UTS 


NUS 


Stanford 


UQAM 




Spain 


Chile 


Australia 


Singapore 


USA 


Canada 


1. Usability 


l.l.l.l 


100 


0 


0 


0 


0 


0 


1.1.1.2 


100 


0 


100 


100 


100 


0 


1.1.1.3 


0 


0 


100 


0 


100 


0 


1.1.2 


90 


90 


90 


80 


90 


80 


1.1.3 


0 


0 


100 


0 


100 


0 


1.1.4 


100 


100 


100 


100 


50 


100 


2. Functionality 


2.1.1.1.1 


60 


100 


60 


100 


100 


100 


2.1.1.1.2 


0 


0 


100 


0 


100 


0 


2.1.1.1.3 


0 


0 


0 


0 


100 


100 


2.1.1.2 


60 


60 


60 


0 


100 


100 


3. Reliability 


3.1.1.1 


0 


75.02 


74.1 


68.06 


58.32 


0 


4. Efficiency 


4.1.1 


75.3 


50.46 


82 


51.46 


100 


83.44 


4.2.1.2.1 


34.38 


45.36 


81.65 


36.22 


47.29 


53.15 



Table 1, shows some results of elementary preferences after implementing the 
corresponding criteria function for each attribute of academic sites. The results are 
just elementary values where no aggregation mechanism was yet applied and no 
global outcomes yet produced (the f-step of our methodology, as commented in the 
Introduction section), though, some relevant analyses can be made. 

For instance, we can see that two out of six sites have no resolved the Global 
Organization Scheme feature (i.e., they have available neither Site Map, nor Table of 
Contents, or Alphabetical Index attributes). As previously said, when visitors enter 
mainly for the first time at a given home page, the availability of at least one of these 
attributes might help them in getting a quick Global Site Understandability both for 
the structure and the content. Likewise, attributes such as Student-oriented Guided 
Tours, and Campus Image Map might also contribute to the global understandability. 
For example, only Stanford University and UTS sites have Student-oriented Guided 
Tours', both are excellent tours accomplishing the 100% of the elementary preference, 
but the one in UTS is simply outstanding. Not only it has student-oriented tour but it 
also contains a personified guide for each academic unit. (The visitor can access it in 
the table of contents's "For Students" label, in the "Virtual Open Day" link, as 
pointed in Fig. 1). 

On the other hand, all evaluated sites have the necessary Campus Image Map 
attribute (1.1.4 coded); only the Stanford campus image map was not easy to access it, 
and was not well structured (getting 50% of the preference). Let us recall that, in the 
end of the evaluation and comparison process, for each selected site a global indicator 
of quality is obtained using the scale from 0 to 100%. In addition, such rating could 
fall in one out of three categories or preference levels, namely: unsatisfactory (from 0 
to 40%), marginal (from 40 to 60%), and satisfactory (from 60 to 100%). The global 
preference can be approximately interpreted as the degree of satisfied global quality 
requirements. 
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Regarding the Functionality characteristic, there are two basic functions to move 
around a site in order to find information, i.e., browsing and searching. In addition, 
from the point of view of current and prospective students, scoped search functions as 
outlined in the requirement tree are necessary attributes. For instance, we found all 
sites having at least the basic feature of People Search attribute; however, did not all 
sites have Course Search facilities. In addition, the reader can appreciate the 
elementary results to Dangling Links (3. 1.1.1), Quick (static) Pages (4.1.1), and 
Image Title (4. 2. 1.2.1) attributes, which were automated as commented in previous 
sections. Finally, a wider analyzes of elemental, partial, and global outcomes can be 
found in [14]. 



5 Concluding Remarks 

In this paper, standardized quality characteristics, and about eighty directly or 
indirectly measurable attributes for websites on the academic domain were 
considered. The main goal was to represent quality requirements in order to arrange 
the list of characteristics, subcharacteristics, and attributes that can be part of a 
quantitative evaluation, comparison, and ranking process. The proposed Web-site 
QEM methodology, grounded in a logic multi-attribute scoring model and procedures, 
is intended to be a useful tool to evaluate (and eventually compare) the product 
quality in development or operational phases of a Web application lifecycle. 

The evaluation process generates quality indicators or preferences that can be 
easily analyzed, traced, justified, and efficiently employed in recommendation 
activities. The outcomes might be useful to understand, control, and improve the 
quality of Web sites and applications in small, medium, and large-scale projects. 

Besides, we have shown a hierarchical and descriptive specification framework for 
characteristics, subcharacteristics, and attributes. We have specified a characteristic 
and five attributes for the academic case study following a regular structure, i.e., title, 
code, element type, highest level characteristic, super and subcharacteristics, 
definition/comments, elementary criterion type, preference scale, data collection type, 
template of metrics and parameters (does not depicted in this work), weight, and 
example components, among others. On the other hand, this framework is a key 
underlying piece in the construction of a hyperdocumented evaluation tool. 
WebQEM_Tool (an ongoing evaluation tool) supports to this Web-enabled 
specification framework, and is intended to assist evaluators in the editing, calculating 
and hyperdocumenting activities of Web-site QEM, in a collaborative way. 

Einally, according to the gained experience in the studies (in museum, academic 
and, e-commerce domains), and in quality assurance processes on new projects, we 
have observed that many subcharacteristics and attributes can be reused among 
different Web domains, considering a specific user view (some others are 
unavoidably domains specific). 
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Abstract. Despite the rapid evolution of Web technologies and 
development tools and skills, most Web sites fail (to varying degrees) 
to achieve their true business goals. This is at least partially due to our 
inability to effectively define Web acceptance criteria (from both a 
client perspective and a developers perspective). These criteria cover 
those characteristics that the final system must possess, and against 
which the development can be carried out. Examples include broad 
business objectives, and detailed content and functional descriptions, 
but also navigability, user engagement, site evolvability, and especially 
site maintenance. In this paper we consider the need for an improved 
ability to define acceptance criteria for Websites as a target for the 
design and maintenance process. We describe a framework that 
includes dimensions covering both product criteria and organisational 
elements. We also discuss how the various dimensions within this 
framework can be represented using various existing techniques. 



1 Introduction 

There has been a phenomenal recent growth in the development of interactive media 
applications. This is especially true of Web-based development. Despite this rapid 
growth - or possibly as a partial consequence of it - our understanding of the purpose 
and design goals of Websites during development is typically very poor [1,2]. This 
problem is most noticeable in the difficulties that are typically encountered during 
contract and tendering negotiations for the outsourced development of Web-based 
projects. It is not uncommon to find Web projects that are poorly understood, 
resultant sites or applications which do not come close to achieving their desired 
aims, project bids which vary in estimated cost by anything up to an order of 
magnitude, and significant conflicts or dissatisfaction between developers and clients 
over the costs and development results [2]. 

Many of these problems can be traced to an inability to accurately define the needs 
of clients in a way that is both understandable to developers and expressed in a way 
that allows them to be tied to the specific technical constructs which underpin the 
Web [1,3]. If we consider other developmental domains then we can see the 
difference more starkly. For example, in software engineering, it is common practice 
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to develop - often in collaboration with the client - a set of "acceptance criteria” which 
define the goals of the project. These criteria then provide a target against which the 
development must be carried out. Additionally, these acceptance criteria and the 
resulting specifications are key elements in techniques for determine the scope and 
costs of projects - such as Boehm's development of COCOMO [4] and Basili's work 
on the TAME resourcing model [5]. In these cases, although the specific form of the 
acceptance criteria may vary - for example they may be represented as a user 
requirements document or as a contracted statement of work - the language is which 
they are couched is well understood within the profession. Indeed most professions 
have developed specific "languages" that are commonly understood and which can be 
used to define the needs and scope of a project. This is not yet true of Web 
development. Although the technical aspects of the Web are well understood, and 
methods for expressing client objectives are evolving, the two have not yet been 
reconciled. 

This is not to say that it is appropriate or even desirable to create a pro-forma and 
associated specification language for defining "Web Requirements Specification". 
Indeed, as we shall discuss later in this paper, the nature of Websites and hence the 
development process is such that this may be counter-productive. We do however 
need to be able to define a target against which the design, implementation and 
maintenance of Web sites can be carried out. For want of a better term, we have in 
this paper referred to this target as the site acceptance criteria - i.e. those elements that 
specify what will result in a Web site or system that is acceptable to the client. 

In the following section we consider in more detail the need for acceptance criteria 
and how they are handled in other domains. We also look at the current situation in 
Web development and comment on the problems that are arising as a result of a lack 
of a common language for defining acceptance criteria. 

We then move on to looking at how such a language might be constructed and 
what form it might take for Web development. In particular we emphasise that for 
Web development the language needs to consider both the product characteristics 
(where research on areas such as usability analysis can be beneficial) as well as 
organisational processes. These processes need to be put into place to cope with the 
incremental and ongoing nature of website evolution. Including these processes in the 
initial consideration, and hence into acceptance criteria, is crucial. We illustrate this 
point by looking at some analogous domains (landscape gardening, city planning, 
etc.) where the goal is not just a "product", but also includes the ongoing process for 
coping with the evolution of the "product". 

We then consider how we tie these elements together into an overall framework for 
defining Website projects. This framework makes use of various disparate areas, 
including work on hypermedia and Web evaluation such as SUE [6], the Technology 
Assessment Method, [7] and conventional software engineering processes and 
standards. 

We finish by acknowledging that we have raised many more questions than 
answers, but have provided an initial research agenda and begun pointing the way 
towards possible resolutions of some of these questions. Even at this early stage, it is 
possible to utilise these ideas in defining Web specifications that are clearer and more 
likely to result in improved systems. 
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2 Background 

Web development encompasses the creation and maintenance of an increasingly wide 
range of applications, covering a diverse set of needs and potential clients. They 
extend from content-rich Web sites to E-commerce systems; from flexible document 
management systems to workflow and process tools. Although the domains of 
application are very diverse, these applications all have some common characteristics. 
For example, they all utilise rapidly evolving Web technologies to provide solutions 
which evolve over time in a significantly more fine-grained, even organic, manner 
than is typical of more conventional information or software systems. An important 
question that this raises is how exactly do we develop an understanding of exactly 
what form these solutions should take. 



2.1 Understanding Client Needs in Web Development 

It is commonly accepted in commercial Web development that the determination of 
the purpose and scope of Websites is typically very poorly understood. This is for a 
variety of reasons, but includes a lack of understanding of the Web development 
process [8], an evolving understanding of the potential of Web technologies, and 
communication breakdowns between clients and developers. This generates a wide 
range of potential problems, including: 

• Poor quality and unmaintainable applications. If the developers are unable to 
understand and/or express the needs of clients in a way that allows translation into 
specific design solutions, then the resultant Web applications or systems will be 
inherently unable to accurately address these needs. The result will be applications 
which have low quality (in the sense that quality equates to "fitness for purpose"). 
They are also more likely to be difficult to maintain, given that the initial structure 
will be less well suited to the initial needs. 

• Poor scoping and hence planning of development projects. When developers do 
not have a good grasp of the needs of the clients it becomes substantially more 
difficult to determine the scope of development projects, and hence to resource and 
plan the projects. 

• Increased difficulty in providing competitive bids. The vast majority of Web 
development is carried out on a commercially competitive basis. Competitive 
bidding on these projects relies heavily on accurate cost estimation which in turn is 
dependant upon an accurate understanding of the client needs and how these relate 
to the technical foundations upon which the application will be built. Without an 
understanding of clients' needs there will be substantial risks of both underbidding 
and overbidding. Conversely, an ability to accurately define client needs is a 
critical element of improved models for resource allocation and cost breakdown - 
which are in turn important for bid preparation, and in assisting clients to evaluate 
bids more objectively and comprehensively. Greater budgetary detail will also 
enhance the effectiveness of contract negotiation. 

It is worth noting in passing that many of the more successful commercial Web 
development organisations have addressed this issue by taking a strongly 
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collaborative approach to the identification of client needs. They will often work very 
closely with a client during not only the initial discussions, hut often well into the 
design stages of a project. 

Having accepted the importance in the development process of understanding 
client needs, it is useful to consider how client needs are elicited in other development 
domains. 



2.2 Software Specification Process 

In software engineering there are well-estahlished mechanisms for identifying and 
recording user needs. For example, the process might typically involve moving from a 
set of client needs to a formalised expression of these as a set of user requirements 
(often recorded as a URD - User Requirements Document). The URD can be 
analysed, and subsequently refined, using a variety of analysis techniques, tools and 
methods to determine possible flaws, missing requirements or ambiguities. The result 
of this is a refined URD that captures the clients view of their needs, and a Software 
Requirements Specification (SRS) which captures a technical expression of the 
requirements that can be used as the basis for development. If developed correctly, 
the URD is understandable and acceptable to the client, and the SRS is a consistent 
technical representation of the URD. 

Although the specific process will often vary from this, the basic activities of the 
process and the outcomes are relatively well established. Similarly, we can go one 
step further and look at the elements that are typically considered in a URD, SRS or 
equivalents: functional requirements, performance requirements, interfaces and 
behaviours, and non-functional requirements such as robustness and maintainability. 

The notations and terms for discussing these elements vary, but the terms are still 
couched in a common language that is relatively well understood hy both clients and 
developers. Similarly, there are common expectations about the process even though 
the specific activities may vary (being incremental, involving prototyping etc). The 
result is a common expectation that the initial stages of the project will result in a 
project specification that can be used as a target for a well defined, and well-bounded, 
development project. It is also worth noting that the same requirements are typically 
used at the end of the project as the basis for a set of acceptance tests that determine 
the acceptability or otherwise of the product that has been developed. This essentially 
closes the development loop, allowing both clients and developers to close the 
development (or specific stages of the development - something that is typically very 
important for contractual development. 



2.3 Web Specification 

Unlike software development, and development in most other domains, Web 
development is typically lacking numerous aspects. These include: a well-established 
process for developing an understanding of client needs; a language which is common 
between clients and developers for communicating and representing these needs; and 
a clear technique for closing the loop and providing closure for development effort. It 
should be recognised that in each case there are gradually appearing commercial 
approaches to addressing these - though not in a consistent or cohesive way (which is 




A Framework for Defining Acceptance Criteria for Web Development Projects 283 



critical for a clients understanding of, and ability manage a project). Each of these 
elements is critical for an effective and manageable development process. 

We can gain some insights into this by looking at existing research directions. 
Consider the above discussion about elements of a software specification: functional, 
performance, behavioural, user-interface, non-functional. This categorisation is not 
necessarily appropriate for Web applications - given the inherent differences between 
most Web applications and other types of software applications or systems. We can 
start to identify a parallel set of requirements categories by looking at the nature of 
Web applications and how they might be evaluated. 

For example, work on the SUE methodology [6] has provided a systematic 
approach to the evaluation of hypermedia (including Websites). This method provides 
a broad multi-dimensional analysis of different elements of usability, considering 
criteria such as accessibility, orientation, user control etc. These criteria, and the 
associated evaluation activities, are aimed at identifying possible problems in 
applications, rather than quantifying client needs. They do however provide guidance 
on those aspects of development related to creating a usable application and which are 
worthy of consideration during the initial determination of client needs. Table 1 
shows the usability attributes defined by SUE. 



Table 1. SUE Flypermedia-Specific Usability Attributes 



General Principles 


Criteria 


Attributes 


Efficiency 


Accessibility 


Access layer soundness 
Navigational richness 


Orientation 


Session history soundness 
Context observability 
Reuse soundness 


User control availability 


Media control availability 
Navigational control availability 


Learnability 


Consistency 


Structural Consistency 
Dynamic Consistency 


Predictability 


Regularity 

Media Interface Soundness 
Navigation Interface Soundness 
Collection Ordering Coherence 
User’s Knowledge Conformance 



SUE however only addresses a very specific hypermedia-related set of usability 
attributes. It does not address issues related to the extent of the required content, ways 
in which this content might be maintained over time, the expected functional 
behaviour, issues such as security and access control, etc. Similarly, the concepts are 
phrased in the language of evaluation and not in a ‘client’ understandable language. 
This needs to be addressed before we can begin to effectively develop a basis for 
specifying Web application acceptance criteria. 
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3 The Need for a Specification Language 

One of the biggest problems currently facing web developers and their clients is the 
lack of an established language or vocabulary for describing Web systems. Much of 
the vocabulary of the developer is tied to specifics of tools and technologies. This is 
quite necessary for following new technological trends and advancements in the field. 
Unfortunately, the client for whom a web application is being developed is not likely 
to have the same vocabulary as the developers for discussing systems and technology. 
They will describe their web requirements in terms that are specific to their particular 
domain of interest - often a particular business domain. These descriptions will often 
rely heavily for their meaning on the nature of the domain, and without a detailed 
knowledge of this domain, the developer runs the risk of misinterpreting these 
descriptions. In addition, such descriptions often do not lend themselves to 
quantifiable objectives, something a developer should seek to establish in order to 
define the scope of the project. An ambiguous, unquantifiable system description is a 
project disaster waiting to happen. 

For a developer, this ambiguity is an unavoidable consequence of trying to map 
client needs in a specific domain into acceptance criteria using a universally 
understood language (i.e understandable by both client and developer) and then into 
particular implementation technologies. This problem is not specific to web 
developers. It is faced in many other domains where clients’ needs must be 
interpreted in order to specify a product. Two often cited examples are software 
development, and graphic design. Techniques for describing user requirements 
developed in many of these fields are applicable to web development. Unfortunately, 
on their own they are not sufficient. The fact that many different disciplines provide 
input the Web development process means we cannot rely exclusively on any 
particular one, and hence we must borrow from them wherever possible to develop a 
specification language for web projects. 

Of perhaps greater concern in Web development is the rapidity with which 
technology and tools evolve. In the lifecycle of a single project, a technology can 
become obsolete, or a preferred look-and-feel can become "yesterdays news". Such a 
dynamic environment introduces new complexities into a specification language. If a 
project’s scope is not carefully specified then changes in technology or customer 
expectations can lead to difficulties in ever establishing that a project has met the 
stated specifications. For example a non-exact specification such as "must look good" 
is very subjective, and can easily change as tastes change over the period of the 
development lifecycle. 

A similar problem is the changes that occur in the clients’ technical understanding 
and level of expectation over the life cycle of a project. The rapid growth of the web 
and its related technologies makes it difficult even for people in the field to keep 
abreast of all that is going on. For clients’ whose core interest is elsewhere, it is well 
nigh impossible. Flowever, the popularity of the Web means that increasingly diverse 
groups of people are being drawn into discussions about how the technology can be 
applied to benefit them. As these people are exposed to the technology their 
knowledge grows, and with it their expectations and requirements also grow. What, 
at the beginning of the project, seemed quite impressive to a client has, by the end of 
the project, become unsatisfactory. From the developers' perspective, it is essential to 
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describe their specification in a way that clearly defines their responsibilities in regard 
to the scope of the project. 

Before we consider the dimensions of this specification language there is one final 
set of observations that are important to make. This is with regard to the nature of the 
Web development process and how it might influence what we wish to express. 



4 The Form of Web Development 

Much of the language of software specification has an implicit assumption about the 
way in which software development is carried out. Specifications are typically 
predicated on an understanding that there will be a particular software "release" which 
is the target of development. This in turn means that it is reasonable to define just the 
nature of this release (or a specific set of releases). 

Web development typically has a very different development cycle, and as such the 
underlying assumptions regarding how systems are specified needs to be carefully 
examined in the context of this changed process. 

4.1 Web Development: Organic Rather Than Defined Releases 

The similarities of web development to fields such as software development and 
graphic design can mask the rather significant differences. As was mentioned briefly 
earlier in the paper Web development tends to differ greatly in that we are no longer 
aiming to develop a "finished" product. Rather, we are aiming to create an organic 
entity that starts with an initial consistent structure, but continues to grow and evolve 
over time. This evolution is much finer-grained than the maintenance changes that 
occur with more traditional software products, and tends to be an integral part of the 
life cycle of the product. Compare this to conventional software maintenance, which 
tends to be a coarse-grained response to errors in the products, changes in 
requirements, or a changing environment. 

A major consequence of this is that it becomes no longer appropriate to define a set 
of acceptance criteria for a fixed product. Indeed, the concept of defining a static 
development target is no longer relevant. Yet, despite this, we still need to have a 
basis for both development and evaluation, and probably also contract negotiation. 
How do we define acceptance when there is no stationary target against which we can 
design? To answer this, let us look a little deeper still. 

Most Web development typically involves the establishment of an initial 
information architecture [9] that then supports the evolution of the site. This evolution 
(at least when it is successful) includes a comprehensive integration of the content 
maintenance into the organisational processes of the client. Where this integration 
does not occur, the site rapidly stagnates and ceases to serve a valid function. As 
discussed in [9], in this context a successful development effort would cover: 

• "Clarifies the mission and vision for the site, balancing the needs of its 
sponsoring organization and the needs of its audiences. 

• "Determines what content and functionality the site will contain. 

• "Specifies how users will find information in the site by defining its organization, 
navigation, labeling and searching systems. 

• "Maps out how the site will accommodate change and growth over time." 
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It is the last point that is the element which provides the fundamental difference 
between Web development and development of conventional software systems. It 
implies that the project does not have a point of closure. Rather, the criteria for 
development cover both the initial framework that must be established, and 
procedures for ongoing development that must be put in place. 

As a simple example consider the situation where a business sells many different 
types of widgets. They currently have well-established, effective, and well- 
understood processes for managing stock levels, product catalogues, supplier ordering 
etc. They decide that they need to "sell via the internet" in order to remain 
competitive. They contract out the development, specifying the content, look and 
feel, marketing focus and potential target group of the site. The developers create and 
deliver a site that is initially very effective, but rapidly becomes extremely difficult to 
maintain. The product lists and details rapidly change and the site requires continual 
development to keep up to date. The ordering processes are such that the company 
now has two different and incompatible sets of business processes. The problem is 
exacerbated when 6 months later the company decides to change it's marketing 
campaign and wishes to modify the entire look and feel of the site. In other words, 
the entire site has not been developed to integrate cleanly with existing content 
databases, business practices and workflows, nor with an understanding of the 
potential for significant changes. 

The implications of this can be better understood by looking at some development 
domains that involve similar evolutionary or organic development. 



4.2 Web Engineering or Web Gardening? 

In order to understand the evolutionary nature of the web development lifecycle it is 
useful to move away from viewing it in relation to software engineering, graphic 
design, or marketing, as is often done. Although perhaps not obvious at first sight, 
parallels can be drawn between web development and areas such as town planning 
and landscape gardening. Let’s explore each of these further. 

Software engineering is about adopting a consistent and scientific approach, 
tempered by a specific practical context, to development and commissioning of 
systems or applications. Website development is often much more about creating an 
infrastructure (laying out the garden) and then 'tending' the information which grows 
and blooms within this garden. Landscape gardening involves the creation of a 
structure that is consistent with the initial objectives, but also takes into account the 
way in which the garden will change over time. A good initial design for a garden 
will allow this growth to occur in a controlled and consistent manner. The evolution 
of Web applications is analogous to a garden changing as a natural part of its cycle of 
growth. We have inherent growth (changes in the content), day to day maintenance 
of the garden (updating links, compressing databases, regenerating dynamic pages) 
and very occasional redesigns. In both cases, we are constantly working with a 
changing evolving system. 

We can also draw comparisons with town planning. This emerged out of the chaos 
that resulted as large numbers of people came to occupy relatively small areas of land. 
Without decent roads, water and plumbing such places were prone to regular 
disasters. While populations were small, a haphazard approach to organisation was 
sufficient, and problems could be addressed as they arose. For larger populations. 
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problems quite readily became disasters. A well thought out, organised town tends to 
be less prone to problems, and, when problems do arise, they can be addressed more 
readily. 

But town planning goes further than organising the town as it stands now. It 
recognises that towns and cities are dynamic entities that tend to grow and change, 
and that this growth, if not managed carefully, will rapidly result in significant 
problems. Growth is planned for, and the town infrastructure is continually expanded 
to properly support future growth. This approach of the town planner holds a 
valuable lesson for the web developer. Web structures are rarely static. The client 
will want to continually add and change content, modify the look and feel, or enhance 
the functionality long after the site has been initially ‘commissioned’. Users will 
want new ways of accessing and navigating through the changing information. 
Technologies will evolve and become more sophisticated. In other words, the 
development of a web application does not halt when it is initially commissioned. 
Rather, its growth has just begun. The ability of the information architecture to cope 
with this growth is a significant factor in the perceived success or failure of a web site 
- and should therefore be an integral part of any set of acceptance criteria. 



4.3 Delivering Product and Process 

The upshot of the above discussions is that it implies that, unlike more conventional 
development, the development of Web applications needs to consider much more 
actively both the initial product and the process by which this will be managed and 
maintained. As borne out by current practice, the focus of successful Web 
development is not only the particular content, nor is it only the development of an 
architecture for organising and maintaining this content. Rather it extends to cover 
the integration of these elements into the activities and culture of the client 
organisation. This integration will need to involve both designing technological 
solutions which are suited to the business processes, and elements of BPR (Business 
Process Re-engineering) [10] where the business processes are adapted to the 
constraints or requirements of the technologies. These key observations give us clear 
pointers to the form which acceptance criteria must take. 



5 An Acceptance Criteria Framework 

Drawing the above diverse discussions together, we have developed an initial 
framework for specifying acceptance criteria for Web sites. This framework, shown in 
Table 2, identifies the key dimensions that should be covered in defining a "target" 
against which development can be carried out. 

Several important observations need to be made. First, unlike more traditional 
application development, these dimensions not only define a specific product, but also 
expectations about how that product should be able to evolve over time. Second, the 
three top-level categories (client/user, application framework, and application 
evolution) are tightly interrelated and cannot be treated - or specified - in isolation. 
Finally, the dimensions should not be mistaken for specifying possible designs, or an 
information architecture, or implementation constructs. They are solely intended to 
identify those aspects that need to be specified in order to define expectations of the 
outcomes of a Web development project. 
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Table 2. Acceptance Criteria Framework 



Dimension 


Possible 

Representations 


Example Elements 


Client/User 






Client problem 
statement 


(Natural language) 




Product vision 


(Natural language) 


Client needs and business 
objectives 


Users 


(Natural language) 


User descriptions and 
models 


Application 






Content 

modelling 


Structured language, 
hypermedia / information 
modelling languages 
(OOHDM, HDM, entity 
modelling, etc.) 


Existing content structure. 
Information views. 
Navigational structures. 
Required content 


User interaction 


Modified TAM 


Usability and usefulness 
metrics 




Structured language, 
hypermedia modeling, 
HCI models, etc 


Access mechanisms, user 
control behaviour, user 
orientation, search 
requirements, security 
control 


Development 

Constraints 


Natural language, 
standards 


Adherence to corporate 
policies. Resource 
availability 


Non-functional 

requirements 


Natural language, quality 
metrics, adherence to 
standards 


Reliability of content. 
Copyright constraints 


Application 

Evolution 






Evolution 

directions 


(Natural language) 


Expected content changes 


Client adoption/ 
integration of 
Web 


Business Process 
Reengineering 


Information dissemination 
paths. Workflow changes 


Maintenance 

processes 


Natural language, process 
models 


Content maint. 
responsibility, Web 
management cycles 
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For each of the acceptance criteria dimensions we have provided some initial 
(though as yet unproven) representations which can form the basis of a specification 
language. These representations need to be understandable by both clients and 
developers - i.e. they must form a common language. This language provides the 
basis for the expression of clients' needs, and the basis for developers design and 
implementation. We have not yet validated these, other than through some initial 
unqualified studies and using anecdotal information. 

In order to demonstrate the applicability of this framework to the specification of 
websites and web-based systems we have developed several examples. A simple 
partial example of a Web specification that utilises these concepts is shown in 
Appendix A. This illustrates how the various elements of the framework can be 
applied. We are currently undertaking work looking at how existing Web 
specifications map to this framework, and evaluating the use of the framework in 
commercial development projects. 



6 Conclusions and Future Work 

Based on a detailed consideration of how Web applications are being developed, and 
in particular the organic nature of Website evolution and maintenance, we have 
proposed a framework for defining acceptance criteria for Websites. These acceptance 
criteria can provide a basis for contract negotiations between clients and developers, 
but even more significantly, they can provide a basis against which the design can be 
carried out. The framework includes both the dimensions required to define 
acceptance criteria, as well as an initial tentative identification of potential 
representations that can be used for documenting the different dimensions of the 
acceptance criteria. These representations form a common language that enables 
clients, users and developers to effectively discuss the requirements of Web 
applications. 

The work to date has provided a justification for the need for an acceptance criteria 
framework, and the basis of a research agenda in this area. Future work will focus on 
developing a greater degree of rigor in the dimensions of the framework. We shall 
then clarify the possible representations that can be used to capture each of these 
dimensions. Indeed further research is likely to emphasise not the particular 
representations, but the constraints that the representations must meet to be valid for 
that dimension. 

A parallel stream of research is to correlate these dimensions and representations to 
empirical data on Web project specifications and development contracts. This will 
help us determine those aspects that have proven to be most useful for practical 
development, and their relative importance. Most significantly, this work will look at 
how user acceptance (measured using an adapted version of Davis’ technology 
assessment model [7]) relates to the significance of various application characteristics 
and hence potential acceptance criteria. 

Developing effective techniques for creating Web application acceptance criteria 
will have several significant benefits. These include: more effective and better 

managed negotiations between clients and Web developers; more clearly defined 
applications and hence higher quality applications; and a significantly improved 
ability to understand the scope of development early in the project and hence 
improved management, resource and costing. 
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7 Appendix: Sample Specification 



The following is a collection of fragments from a typical Web specification. (Note: In 
the final version of this paper, a link will be provided to a full online version of a 
typical specification). 
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XYZ Widget Company 

Web Acceptance Criteria 

Document number: ZYZWC-34TR4 

Version: 0.3a (draft) Date: 17th November 1999 

Author: David Lowe, University of Technology, Sydney 
Distribution: uncontrolled 

Overview 



This document contains the specification of the Web-based E-commerce system to 
be developed for XYZ Widget Company. It outlines key issues, development 
constraints and site requirements. This document contains: 

1 . Client Problem Statement 

2. Site Vision 

3. User models 

4. Required content 

5. User interactions 

6. Non-Functional requirements 

7. Development constraints and technical restrictions 

8. Support for site maintenance 

9. Integration into client organisation 

10. Development schedules and deliverables 

1 1 . Acceptance mechanisms and client liaison 

1 . Client Problem Statement 

XYZ Widget Company is the leading European distributor of quality commercial- 
grade widgets. They have been in operation for over 70 years and have established 
an international reputation for quality products and efficient and effective service. 

In line with this emphasis on providing service to our clients we wish to extend the 
distribution channels to include the internet. Several of our competitors in the 
highly competitive widget industry have established a Weh presence and we see 
this as both a significant threat and a huge opportunity to broaden our client base. 

In this context we see a Web presence as providing both a new channel providing 
access to similar services to those we currently provide, as well as a vehicle for 
extending the range of services. 

more details here 

2. Site Vision 



Client Needs: The clients of XYZ-WC are extremely diverse, coming not oly from 
the commercial sector, but also from on-sellers, individual contractors, and 
government agencies. A full list of clients, along with general characteristics and a 
client needs analysis, is given in Appendix A. more details here 
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Business objectives: The purpose of the site if two-fold. Firstly, the site must 
support the maintenance of the existing client base, by providing an enhanced 
service. Specifically, the site should not only support existing ordering processes, 
but also provide access to information not currently available. For example, the 
provision of Widget data sheets will facilitate the retention of clients. Secondly, the 
site must be able to support (and be consistent with) the active marketing 
campaigns currently carried out by XYZ-WC. More details here 

3. User models 



Appendix B contains full scenarios (diagrammed using object-oriented UML 

notation) that illustrate usage patterns that the site must support. 

more details here 

4. Required content 

The content that must be accessible from the site includes the following: 

♦ Full product catalogue, including product specifications, an image of each 
widget, product data sheets, cost and more details here. Note that users must be 
able to identify themselves and the relevant costing model utilised (including 
different currencies, and tax schemes). An example of the existing product 
information contained in the master product catalogue is shown in Appendices 
C and D. 

♦ Contact information for XYZ-WC 

♦ Corporate information on XYZ-WC 

♦ Information on the correct usage and installation of widgets 

♦ more details here 

Note that the site should be developed in a way which is consistent with the 

assumption that the information to be provided in the site will change regularly. 

Appendix E of this specification provides a model of the current information 

sources, represented using OOHDM. 

more details here 

5. User interactions 



The site must support effective interaction with users. Appendix B detailed typical 
usage scenarios that must be available. In addition to this the site must support the 
following interaction mechanisms: 

♦ Every page must contain the primary site menu, and an identification of the 
current location within the site structure. 

♦ A site map 

♦ A search engine that allows clients to search for Widgets by name, part 
number, clasification, manufacturer, more details here 
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♦ Users must be able to register with the system and then log in at a later date. 
Once logged in the system will utilise information stored on company, deliver 
details, payment schemes, costing models, more details here 

♦ more details here 

Utilising TAM (Technology Acceptance Model) the developed site must rank at 
least 9.0 on the "perceived usefulness" scale and at least 6.0 on the "percieved ease- 
of-use" scale. 

more details here 

6. Non-Functional requirements 
more details here 

7. Development constraints and technical restrictions 
The site must be: 

♦ usable on all browsers from IE3 and Netscape 3 onwards 

♦ No page must be larger than 30k to download (including images) unless the 
user is explicitly warned of the large size. 

♦ more details here 

8. Support for site maintenance 

Appendix G of this specification details the current workflows used to maintain the 
product catalogues, ordering databases, invoicing and supply processes, and 
supplier purchases. The site must be integrated into these processes so that there is 
no impact on the ability to carry out, or cost in carrying out, these activities. The 
content in the site must be automatically maintained from the existing databases 
and integrate with these flows. 

More details here 

9. Integration into client organisation 
More details here 

10. Development schedules and deliverables 
more details here 

1 1 . Acceptance mechanisms and client liaison 



more details here 
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Appendices 

A. XYZ Widget Company Client list and characterisation 

B. Usage models: Scenario diagrams and use cases 

C. Example XYZ Widget product catalogue 

D. Sample of master product database 

E. OOHDM model of information sources 

F. Sample screen design 

G. Current workflow processes 
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Abstract. Accurate estimates of development effort play an important 
role in the successful management of larger Web development projects. 
However, estimating the effort required in developing Web applications 
can be a difficult task. By applying measurement principles to measure 
the quality of applications and their development processes, feedback 
can be obtained to help control, improve and predict products and 
processes. Although to date most work in software effort estimation has 
focused on algorithmic cost models, in recent years research in the field 
of effort estimation has started to move towards non-algorithmic 
models, where "estimation by analogy" is one of the available 
techniques. The first part of this paper describes a case study evaluation 
(CSE) where proposed metrics and the effort involved in authoring 
Web applications were measured. The second half presents the use of 
analogy and two algorithmic models - linear regression and stepwise 
multiple regression - to estimate the authoring effort of Web 
applications, based on the datasets obtained from the CSE. Results 
suggest that estimation by analogy is a superior technique and that, with 
the aid of an automated environment, it is a practical technique to apply 
to Web authoring prediction. 



1 Introduction 

The World Wide Web (Web) has become the best known example of a hypermedia 
system. Numerous organisations world-wide have developed so far thousands of 
commercial and/or educational Web applications. However, developing good quality 
applications has a high cost, mostly in time and amount of difficulty involved for the 
authors [1]. Consequently, by applying measurement principles to measure the quality 
of applications and their development processes, feedback can be obtained to help 
control, improve and predict products and processes. 

S. Muragesan and Y. Deshpande (Eds.): WebEngineering 2000, LNCS 2016, pp. 295-310, 2001. 

© Springer-Verlag Berlin Eleidelberg 2001 
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In addition, software practitioners in general recognise the importance of accurate 
estimates of effort to the successful management of major software projects. The Web 
is no exception. By having realistic estimates at an early stage in a project's life cycle 
project managers and development organisations can manage resources properly. 

Rather than examining all possible processes involved in the development of Web 
applications and their prediction opportunities, this paper focuses on effort estimation 
based on solely the authoring process. We adopt the classification proposed by Lowe 
and Hall [2] where authoring is considered a subset of the whole Web development, 
characterised by managing the actual content and structure of the application and how 
it is presented. 

To address these issues, this paper presents the results of a quantitative case study 
which empirically validates a set of metrics proposed to measure the effort involved 
in authoring Web applications. The paper also investigates, based on the data 
collected in the case study, the use of three effort estimation models in predicting the 
effort involved in authoring Web applications: Estimation by Analogy; Linear 
Regression; Stepwise Multiple Regression. Although literature in software estimation 
shows that the analogy method outperforms or mirrors the best algorithmic method 
[3] we wanted to investigate if the results would be similar when applied to Web 
applications. 

Section 2 describes the case study and Seetion 3 presents the effort predietion 
models used, the initial results obtained and measures the predietion power of the 
models employed. Finally, we give our eonelusions and comments on future work in 
Seetion 4. 



2 Measuring the Web Authoring Process 

We used a case study evaluation to measure the effort involved in authoring Web 
applications and several attributes (metrics) of Web applications. Our main objective 
was to collect data and investigate if those metrics could be used as parameters in an 
effort prediction model for Web authoring. The metrics proposed are as follows: 

• Hyperdocument size - the number of documents that each Web application has. 
Documents are HTML files. A Web application is a collection of documents 
designed with a specific aim. In our case, the domain considered to be 
representative was education. The Web applications considered are ‘static’, 
where the number of dynamically generated pages is small, if not absent. 

• Reused-documents - represents the number of HTML files which have not been 
created from scratch. 

• Connectivity - refers to the number of links that the Web application has. The 
links considered here are either structural or referential [4], leaving apart 
dynamically computed links. 

• Compactness [5] - indicates how inter-connected the documents are. A 
completely connected application means that, from each document, there are 
links reaching all other documents in the application. Compactness was measured 
in the case study by estimating its value from 1 (completely disconnected) to 5 
(completely connected). Consequently, the compactness measured was the 
perceived compactness {P -Compactness). 
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• Stratum [5] - indicates to what degree the Web application is organised for 
directed reading. A sequential application represents an application where there is 
only one path to be followed by the reader. For the case study evaluation, 
subjects were asked to estimate the stratum of the application, from 1 (no 
sequential navigation) to 5 (sequential navigation). Consequently, the stratum 
measured was the perceived stratum (P-Stratum). 

• Structure of the application - represents the way in which the documents have 
been linked: sequence, hierarchy, and network [6]. A sequential structure 
corresponds to documents linearly linked; a hierarchical structure denotes 
documents linked in a tree shape and a network structure for documents linked in 
a net shape. 

• Effort - represents the estimated elapsed time (number of hours) it took eaeh 
subjeet to author a Web applieation. 

The proposed metrics adhere to the Representational Theory of Measurement and 
have been theoretically validated according to the validation framework proposed by 
Kitchenham et al. [7]. Their theoretical validation is presented in [8]. 

The case study consisted of the development of Web applications aimed at 
teaching human-computer interaction concepts. These applications were structured 
according to the Cognitive Flexibilty Theory (CFT) principles [9] and were created 
using a minimum of 50 documents. 

The Web applications were developed by 76 Computer Science students from the 
University of Southampton enrolled in the Fluman-Computer Interaction course. Two 
questionnaires were used to collect the data. The first asked subjects to rate their 
level of Web authoring experience on a scale from 0 (no experience) to 10 (a great 
deal of experience). The second (Appendix A) was used to measure the proposed 
metrics. Members of the research group checked both questionnaires for ambiguous 
questions, unusual tasks, definitions in the appendix and number of questions. 

Analysis of the questionnaires showed that six were missing fundamental 
information. Consequently, these samples were not used in the data analysis. 

In order to reduce any learning effects [10] subjects were given an initial task - to 
develop their personal Web pages. Two Web tutorials were suggested to allow 
subjects with no Web authoring experience to learn how to develop Web pages. In 
addition, all subjects received training on the CFT authoring principles (90 minutes). 



Table 1. Sub tasks used in the questionnaire 



Life Cycle Phase 


Sub-tasks 


Design 




Structure the application 


Contents 


Create the text contents 


Create the graphics contents 


Create the video contents 


Create the audio contents 


Create the links 


Interface 


Planning of the interface 


Implementing of the interface 


Test 


Testing the links in the application 


Testing the media. 
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The second questionnaire had two sections: one to collect attributes of the 
application and another to collect the elapsed time spent in developing the application. 
Development was sub-divided into several sub-tasks along the development life cycle, 
as follows: 

The sub-tasks identified have been adapted from [1 1,12,13,14,15]. 

Here are some comments on the validity of the case study. The reader is referred to 
(mendes 2000) for a detailed description: 

• P-Compactness and P-stratum were estimated by subjects, rather than directly 
measured in the applications. It may be argued that estimated values are not 
ideal and may weaken the results obtained. However, it would not be feasible to 
ask subjects to calculate those time-consuming measures. Compactness, as 
proposed by Botafogo et al. [5], also uses an estimated value in its calculation 
(the conversion constant), leading to subjective measures. Finally, stratum, as 
proposed by Botafogo et al. [5] is described by the authors as “not a perfect 
metric, where some problems are evident”. Therefore, we decided to re-use these 
concepts but to employ them differently. 

• Although the number of documents has been identified by hypermedia authors 
as a valuable metric to measure hyperdocument size [16], there are other metrics 
which can also be used (e.g. length of the application, complexity of documents 
and functionality offered) and they will be the focus of our next evaluation. 

• There were three confounding factors [10] in this evaluation: i) subjects’ 
experience; ii) maturation effects, the effects caused by subjects learning as an 
experiment proceeds; and iii) tools used to help develop/manage the Web 
application. The data collected showed that: i) experiences varied greatly. 
Consequently, it was decided to consider two separate groups - subjects with 
experience levels ranging from 1 to 5 (41 subjects) - LEL group - and subjects 
with experience levels ranging from 6 to 10 (29 subjects) - HEL group; ii) 
subjects were asked to develop their Web pages before developing the 
applications and they all received training in CFT principles. Consequently, it 
seems that the maturation effects were controlled, or at least reduced; and iii) the 
number of tools used did not vary greatly. This result was confirmed 
statistically. 

• On average applications had no more than 100 documents and 900 links, on 
average, as subjects were supposed to spend up to three working days on their 
development. These numbers represent small applications and may not be 
representative of large Web applications developed by some organisations. On 
the other hand, the applications developed had a similar to or better 
interface/contents quality than Web applications developed by professionals, as 
noticed by one of the authors when marking the Web applications developed by 
the students. These results seem to indicate that the results of the case study are 
likely to scale-up. 

• The subjects who participated in the case study may not be representative of Web 
professionals, as Computer Science students cannot be categorised as 
experienced Web authors. However, the use of students as subjects, while 
sometimes considered unrealistic, is justified for two reasons: firstly, empirical 
evidence by Bohem-Davis and Ross [17] indicates that students are equal to 
professionals in many quantifiable measures, including their approach to 
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developing software; secondly, for pragmatic considerations, having students as 
subjects was the only viable option for this case study. 



3 Estimation Process 

Once the data had been collected we used one non-algorithmic model, estimation by 
analogy, and two algorithmic models, linear regression and stepwise regression, to 
estimate Web authoring effort. The predictive power of the estimation models was 
measured using the Mean Magnitude of Relative Error (MMRE) and the Median 
Magnitude of Relative Error (MdMRE) [18]. Both are described in detail in Section 
3.4. The statistical significance of the results obtained using MMRE and MdMRE was 
then verified by the T-test and the Mann- Whitney U test (described in Section 3.5). 
The statistical significance indicates if the differences in means or medians is caused 
by chance alone or whether the samples were drawn from different underlying 
populations [18]. 



3.1 Non-algorithmic Models: Estimation by Analogy 

The first step to estimation by analogy is to characterise the project for which the 
estimate is to be made in terms of a number of variables (attributes). These variables 
are then used to find other similar projects (that have already been finished) and use 
the values from their variables to estimate effort. In other words, an estimate for the 
new project is made based on the known effort values for the finished projects. In the 
scope of this paper, projects are Web applications and size-related variables are 
hyperdocument size, connectivity, compactness and stratum. 

The range of variables must be limited to the information available at the point that 
the prediction is required. The variables should portray the project as precisely as 
possible. It is also essential to choose at least one variable that characterises the 
project in terms of its size. 

To estimate the effort using analogy we used an analogy effort estimation tool that 
automates the process and provides an environment where data can be stored, 
analogies found and estimates generated. The tool was developed at Bournemouth 
University under the name of ANaloGy softwarE took (ANGEL) [3]. ANGEL finds 
the closest project by calculating the Euclidean distance from the project to be 
estimated to all the other projects in the projects dataset. The distance is measured in 
an optimum subset of the n-dimensional, normalised space. By normalised we mean 
that all dimensions are in the range 0 to 1, to ensure that they all have the equal 
influence. One of the important features of ANGEL is the ability to determine the 
optimum combination of variables for finding analogies. This task can be time- 
consuming depending on the number of variables as it employs a brute force 
algorithm or exhaustive search of all possible permutations. This is relatively slow for 
a large number of variables. 

Although the number of analogous projects used to estimate the effort can vary, 
literature in the field of software engineering suggest that the closest analogous 
project (1 analogy), followed by the two closest analogous projects (2 analogies), are 
generally the most effective options [3]. However, it is possible that dissimilar 
datasets may reveal different characteristics. It all depends on the number of finished 
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projects to choose from and how similar the projects are. We have used 1, 2 and 3 
analogies to predict effort. 

Once the analogous projects are identified, an unweighted mean of the known 
effort values is used to determine the predicted value. ANGEL does not impose any 
particular set of variables to be used in the analysis, giving the user freedom to define 
the best set of variables to characterise a project. Consequently, the estimation process 
can take optimal advantage of the data available. The user chooses the number of 
variables and their names. The only exceptions are the project number, status and 
actual effort which are compulsory. The Status attribute shows whether or not a 
project has been completed. Projects whose status is "completed" can be used as a 
source of analogy. All completed projects must have values stored for their effort 
field as it supplies the foundation for prediction. 



3.2 Algorithmic Models: Linear Regression and Stepwise Regression 

The general purpose of linear and stepwise multiple regression models is to learn 
more about the relationship between several independent or predictor variables and a 
dependent or response variable. In the scope of this paper the dependent variable is 
effort and the independent variables are hyperdocument size, reused documents, 
connectivity, compactness, stratum and structure. Multiple regression and linear 
regression allow the researcher to ask (and hopefully answer) the general question 
"what is the best predictor of .". 

Regression modelling is one of the most widely used statistical modelling 
techniques for fitting a quantitative response variable y as a function of one or more 
predictor variables x,, X 2 ,...,x^. Regression models are widely used because they often 
provide an excellent fit to a response variable when the true functional relationship, if 
any, between the response and the predictors is unknown [19]. 

Both linear regression and stepwise regression were calculated using SPSS 9.0. 

3.3 Initial Results 

The HEL group contains information about 29 projects and the LEL group 
information for 41 projects. The datasets had at least four size-related variables: 
hyperdocument size, connectivity, compactness and stratum. 

Shepperd and Schofield [20] give evidence of the merit in the strategy of dividing 
highly distinct projects into separate datasets. In addition, it is generally accepted that 
algorithmic models and estimation by analogy perform better on smaller more 
homogenous datasets [3]. Consequently, LEL and HEL groups were divided into 
more homogeneous sub-groups, LEL-Gl, LEL-G2, HEL-Gl and HEL-G2, which 
have 22, 19, 15 and 14 applications respectively. 

Results using Analogy. In order to evaluate how precise the estimations would be for 
each dataset, the following steps were followed: 

• One application at a time had its status changed to "active". 

• ANGEL was used to estimate the "best combination of variables" for 1, 2 and 3 
analogies. 

• Effort was estimated using 1, 2 and 3 analogies and the corresponding "best 
combination of variables". 
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For LEL-Gl our conclusions are as follows (see table 2): 

• The variables that contributed the most for the estimations were Hyperdocument 
size + Reused documents (18 applications), followed by Hyperdocument size (2 
applications). 

• Two and three analogies gave the greatest number of best effort estimations. 

For LEL-G2 our conclusions are as follows (see table 3): 

• The variables which contributed the most for the estimations were Connectivity + 
Stratum (8 applications), followed by Compactness (4 applications). 

• One and three analogies gave the greatest number of best effort estimations. 



Table 2. Results for the LEL-Gl group 



PROJECT Actual One Two Three Best combination of 

Effort Analogy Analogies Analogies variables for the closest 

estimation 


PROJ02 


10.17 


29 


28.5 


*8.4 


Hyperdocument size 


PROJ03 


34 


30 


*32 


31.67 


Hyperdocument size + Reused 
docs 


PROJ05 


30 


13.5 


21.75 


*25.83 


Hyperdocument size + Reused 
docs 


PROJ06 


34 


*31 


30.5 


30.33 


Hyperdocument size + Reused 
docs 


PROJ07 


13.5 


*30 


*30 


31.33 


Hyperdocument size + Reused 
docs 


PROJ08 


25.58 


29 


*25.5 


20.17 


Hyperdocument size + Reused 
docs 


PROJ14 


15.67 


29 


*17.12 


21.42 


Hyperdocument size + Reused 
docs 


PROJ15 


4.92 


*10.17 


12.09 


11.06 


Hyperdocument size + Reused 
docs 


PROJ17 


7.75 


30 


*21.09 


24.06 


Hyperdocument size + 

Compactness + Stratum + 
Reused docs 


PROJ18 


10.5 


22 


*9.59 


8.03 


Hyperdocument size + Reused 
docs 


PROJ19 


30 


15.05 


*21 


16.58 


Hyperdocument size + Reused 
docs 


PROJ22 


28 


*26.25 


20.12 


24.75 


Hyperdocument size + Reused 
docs 


PROJ24 


26.5 


*30 


22.75 


17.75 


Hyperdocument size + Reused 
docs 


PROJ27 


9 


22 


16.09 


*8.53 


Hyperdocument size 


PROJ28 


28 


22 


18.84 


*22.22 


Hyperdocument size + Reused 
docs 


PROJ30 


31 


34 


*32 


26.05 


Hyperdocument size + Reused 
docs 


PROJ33 


26.25 


28 


31 


*25.33 


Hyperdocument size + Reused 
docs 


PROJ34 


22 


10.5 


28.5 


*22 


Hyperdocument size + 

Structure 


PROJ36 


15.5 


30 


28.25 


*21.42 


Hyperdocument size + Reused 
docs 
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PROJ37 


30 


*30 


32 


31.67 


Hyperdocument size + Reused 
docs 


PROJ39 


29 


15.67 


23.73 


*25.19 


Hyperdocument size + Reused 
docs 


PROJ41 


14 


4.92 


7.55 


*8.03 


Hyperdocument size + Reused 
docs 


1 * closest estimation | 



Table 3. Results for the LEL-G2 group 



PROJECT 


Actual 

Effort 


One 

Analogy 


Two 

Analogies 


Three 

Analogies 


Best combination of 
variables for the 
closest estimation 


PROJOl 


51 


*44 


78 


64.17 


Connectivity + Stratum 


PROJ04 


42 


*42 


40.75 


69.06 


Connectivity + Stratum 


PROJ09 


137 


oo 

* 


46.5 


61.67 


Connectivity + Stratum 


PROJIO 


92 


51 


46.5 


*91.06 


Connectivity + Stratum 


PROJll 


84 


42 


51.25 


*94 


Connectivity + Stratum 


PROJ12 


37.5 


60.5 


46.5 


*45.83 


Connectivity + Stratum 


PROJ13 


52.17 


*51 


46.5 


81.67 


Compactness 


PROJ16 


111 


44 


*47.5 


46.06 


Structure 


PROJ20 


42 


84 


72.25 


*48.83 


Connectivity + Compactness 


PROJ21 


53 


137 


114.5 


*104.33 


Connectivity + Stratum 


PROJ23 


46 


*36.5 


86.25 


61 


Compactness 


PROJ25 


44 


111 


*81.5 


91.33 


Connectivity + Compactness 
+ Stratum 


PROJ26 


36.75 


46 


*41.25 


58.58 


Compactness 


PROJ29 


115 


*36.75 


35.62 


32.5 


Hyperdocument size + 

Connectivity + Stratum 


PROJ31 


60.5 


37.5 


31.38 


*38.17 


Connectivity + Stratum 


PROJ32 


36.5 


46 


*31.5 


64.17 


Hyperdocument size + 

Connectivity 


PROJ35 


112 


*111 


113 


87.5 


Reused docs 


PROJ38 


34.5 


*35 


40.5 


39 


Hyperdocument size + 

Connectivity + Compactness 


PROJ40 


35 


112 


*46.5 


68.33 


Compactness 


* closest estimation 



For the HEL-Gl group our conclusions are as follows (see table 4): 



• The variables which contributed the most for the estimations were 

Hyperdocument size and Reused documents (5 applications), followed by 
Compactness, Stratum and Structure (3 applications). 

• One and two analogies gave the greatest number of best effort estimations. 

For the HEL-G2 group our conclusions are as follows (see table 5): 

• The variables which contributed the most for the estimations were 

Hyperdocument size, Compactness, Stratum, Reused documents and Structure (6 
applications), followed by: Hyperdocument size. Connectivity, Compactness and 
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Structure (2 applications); Hyperdocument size, Connectivity, Stratum and 
Structure (2 applications). 

• One and three analogies gave the greatest number of best effort estimations. 



Table 4. Results for the HEL-Gl group 



PROJECT 


Actual 

Effort 


One 

Analogy 


Two 

Analogies 


Three 

Analogies 


Best combination of 
variables for the 
closest estimation 


PROJ23 


26.08 


126 


69.5 


56.61 


Compactness + Stratum + 
Reused docs 


PROJ24 


26 


14.75 


14.75 


17.5 


Hyperdocument size + 

Compactness + Stratum + 
Reused docs + Stmcture 


PROJ25 


21.83 


26.08 


26.04 


59.36 


Reused docs 


PROJ26 


126 


26 


20.38 


29.92 


Compactness + Reused docs 


PROJ27 


39.17 


78 


78.25 


56.5 


Hyperdocument size + 

Compactness + Stratum + 
Reused docs + Stmcture 


PROJ28 


14.75 


49 


36 


33 


Hyperdocument size + 

Compactness + Reused docs 


PROJ29 


13 


26.08 


26.04 


24.64 


Reused docs 


PROJ30 


14.75 


26.08 


26.04 


24.64 


Reused docs 


PROJ31 


10.08 


26.08 


26.04 


24.64 


Reused docs 


PROJ32 


49 


78.5 


46.62 


73.08 


Hyperdocument size + 

Compactness + Stratum + 
Reused docs + Stmcture 


PROJ33 


30.83 


49 


87.5 


67.33 


Hyperdocument size + 

Stratum + Reused docs 


PROJ34 


78.5 


49 


87.5 


67.33 


Hyperdocument size + 

Compactness + Stratum + 
Reused docs + Stmcture 


PROJ35 


78 


39.17 


44.08 


55.56 


Hyperdocument size + 

Stratum + Reused docs 


PROJ36 


49 


14.75 


18.88 


22.86 


Hyperdocument size + 

Compactness + Reused docs 


PROJ37 


22.33 


26.08 


26.04 


24.64 


Reused docs 


PROJ38 


27 


126 


102.25 


76.83 


Connectivity 


PROJ39 


23 


49 


63.75 


51.17 


Structure 


* closest estimation 



Table 5. Results for the HEL-G2 group 



PROJECT 


Actnal 

Effort 


One 

Analogy 


Two 

Analogies 


Three 

Analogies 


Best combination 
of variables for the 
closest estimation 


PROJOl 


51 


*44 


78 


64.17 


Connectivity + stratum 


PROJ04 


42 


*42 


40.75 


69.06 


Connectivity + stratum 


PROJ09 


137 


* 

oo 


46.5 


61.67 


Connectivity + stratum 


PROJIO 


92 


51 


46.5 


*91.06 


Connectivity + stratum 
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PROJll 


84 


42 


51.25 


*94 


Connectivity + stratum 


PROJ12 


37.5 


60.5 


46.5 


*45.83 


Connectivity + stratum 


PROJ13 


52.17 


*51 


46.5 


81.67 


Compactness 


PROJ16 


111 


44 


*47.5 


46.06 


Structure 


PROJ20 


42 


84 


72.25 


*48.83 


Connectivity + 

Compactness 


PROJ21 


53 


137 


114.5 


*104.33 


Connectivity + stratum 


PROJ23 


46 


*36.5 


86.25 


61 


Compactness 


PROJ25 


44 


111 


*81.5 


91.33 


Connectivity + 

Compactness + stratum 


PROJ26 


36.75 


46 


*41.25 


58.58 


Compactness 


PROJ29 


115 


*36.75 


35.62 


32.5 


Hyperdocument size + 
Connectivity + stratum 


PROJ31 


60.5 


37.5 


31.38 


*38.17 


Connectivity + stratum 


PROJ32 


36.5 


46 


*31.5 


64.17 


Hyperdocument size + 
Connectivity 


PROJ35 


112 


*111 


113 


87.5 


Reused docs 


PROJ38 


34.5 


*35 


40.5 


39 


Hyperdocument size + 
Connectivity + 

Compactness 


PROJ40 


35 


112 


*46.5 


68.33 


Compactness 


* closest estimation 



Results using Linear and Stepwise Regression. The formulas obtained by using 
linear regression and stepwise multiple regression are as follows (see table 6): 



Table 6. Predicted Effort using Linear and Stepwise Regression 



Formula used to obtain the Predicted Effort 


Datasets 


Linear Regression 


Stepwise Multiple Regression 


LEL-Gl 


Hyperdocument size*0.37 


Hyperdocument size*0.50 


LEL-G2 


Reused documents*0.56 


Hyperdocument size*0.50 + Reused 
documents*0.53 


HEL-Gl 


Hyperdocument size*0.62 + 
Compactness*0.58 


Compactness*0.54 


HEL-G2 


Hyperdocument size*0.57 + 
Reused documents*0.53 


Hyperdocument size*0.52 + Reused 
documents*0.60 



For both the linear and stepwise regression we have only considered in the 
formulas variables that presented statistically significant relationships with effort. 



3.4 Measuring the Predictive Power of the Estimation Models 

The predictive power of the estimation models was evaluated with two test metrics: 

• Mean Magnitude of Relative Error (MMRE) 

• Median Magnitude of Relative Error (MdMRE) 

The MMRE and the MdMRE give an indication of the capability of a predictive 
model. Both are widely used and are not limited to regression based methods [3]. The 
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mean, used by the MMRE, takes into account the numerical value of every single 
observation in the data distribution. Although this characteristic makes it an integral 
component of statistical analysis it is not a good option for skewed distributions. On 
the other hand the median, which is used by the MdMRE, is appropriate as a measure 
of central tendency of a skewed distribution of data [18]. 

Small MMREs and MdMREs indicate good prediction models. 

Both MMRE and MdMRE use the Magnitude of Relative Error (MRE) in their 
calculations, which is defined as: 

MRE = 100 X I Egct ■ Epred) / Egct I (1) 

Eact is the actual effort and Epj-ed is the estimated effort. 

The MMRE and MdMRE for the estimation models used in our evaluation are 
presented below (see table 7): 



Table 7. Summary of the Predictive Power of the Estimation Models 













MMRE 


MdMRE 


MMRE 


MdMRE 


MMRE 


MdMRE 




(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


LEL-Gl 


29.15 


11.20 


49.19 


45.74 


50.83 


36.39 


LEL-G2 


28.00 


16.26 


99.32 


100 


55.83 


65.12 


HEL-Gl 


23.34 


18.21 


64.30 


45.07 


92.50 


93.04 


HEL-G2 


19.76 


11.51 


45.89 


54.95 


43.80 


43.62 



The main aspect of Table 7 is that in all cases the analogy method outperforms the 
best algorithmic method, suggesting that at least for the datasets used the analogy 
approach is a superior technique to the algorithmic methods. 

Another aspect of Table 7 is that stepwise regression, contrary to that expected, is 
not consistently better than linear regression. A similar result has also been reported 
in Shepperd et al. [3], where they suggest the following explanation for the 
phenomenon: Regression analysis is based upon minimising the sum of the squares of 
the residuals. A residual is the difference between actual effort and predicted effort. 
As the MMRE is based upon the average of the sum of the residuals, poor predictions 
have far more influence when trying to fit a regression equation than they do when 
assessing the overall predictive performance of the method. This can lead to small 
anomalies in the relative performance of linear regression and stepwise regression 
models. 



3.5 Statistical Significance of the MMREs and MdMREs 

The statistical significance tests are important since they evaluate whether the 
differences are caused by chance or are legitimate. The former means that the samples 
used are drawn from the same population. The latter means that the samples are 
drawn from different populations. 
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The statistical significance of the results was tested using: 

• The T-test of mean difference between paired MREs (for MMRE). 

• The Mann- Whitney U Test (for MdMRE). 

The results obtained for the T-test and Mann- Whitney U Test are as follows (see 
table 8): 



Table 8. Summary of results for the T-test and the Mann- Whitney U test 







T-test 


Mann- Whitney U Test 


Datasets 


Methods 

Compared 


Statistically 

Significant? 


Methods 

Compared 


Statistically 

Significant? 


LEL-Gl 


LRxSR 


No 


LRxSR 


No 




LRxEA 


No 


LRxEA 


Yes, at 1% 




SRxEA 


No 


SRxEA 


Yes, at 5 % 


LEL-G2 


LRxSR 


Yes, at 1% 


LRxSR 


Yes, at 1% 




LRxEA 


Yes, at 1% 


LRxEA 


Yes, at 1% 




SRxEA 


Yes, at 1% 


SRxEA 


Yes, at 1% 


HEL-Gl 


LRxSR 


No 


LRxSR 


No 




LRxEA 


Yes, at 5% 


LRxEA 


Yes, at 1% 




SRxEA 


Yes, at 5% 


SRxEA 


Yes, at 1% 


HEL-G2 


LRxSR 


No 


LRxSR 


No 




LRxEA 


Yes, at 1% 


LRxEA 


Yes, at 1% 




SRxEA 


Yes, at 1% 


SRxEA 


Yes, at 1% 



LR - Linear Regression SR - Stepwise Multiple Regression 

EA - Estimation by Analogy 



The T-test did not show any statistically significant difference between estimation 
by analogy and the other two algorithmic models. However, all the remaining 
comparisons confirmed that the predictions obtained using estimation by analogy 
were more powerful and precise than those generated by the algorithmic models 
employed. 



4 Conclusions and Future Work 

This paper has presented the results of a quantitative case study that empirically 
validates a set of metrics proposed to measure the effort involved in authoring Web 
applications. The paper also investigated, based on the data collected in the case 
study, the use of three effort estimation models, namely Estimation by Analogy, 
Linear Regression and Stepwise Multiple Regression, in predicting the effort involved 
in authoring Web applications. 

A noticeable pattern appeared in that estimation by analogy produces a superior 
performance in all cases when measured by MMRE and MdMRE. These results 
suggest that estimation by analogy is a viable option for some types of datasets. 

In terms of further work, the two original datasets used in the work had four 
variables characterising in some sense the applications. However, as size can be 
described in terms of length, functionality and complexity [21], further investigation 
is clearly necessary to identify the best combination of variables that give best results. 
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We also plan to apply effort prediction to the whole development cycle of Weh 
applications and to compare the performance of estimation by analogy and several 
regression models against human estimation. 

To conclude, there is an urgent need for adequate Web development effort 
prediction at an early stage in the development. As the use of the Web as a resource 
delivery environment increases, effort estimation can contribute significantly to the 
reduction of costs and time involved in developing Web applications. In this paper we 
have shown that analogy may be a viable estimation method for prediction, 
particularly when given the necessary tool support. 

We do not, however, wish to create the impression that analogy based prediction 
should replace algorithmic methods. Dissimilar datasets will have different 
characteristics implying that a variety of techniques should be contemplated. 
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5 Appendix A - Questionnaire 



Name: 

Surname: 



For all time-related questions, please supply your estimate of elapsed time, in hours, 
considering only the mechanical 

aspects of the hypermedia design tasks and not the time it took you to research the contents. 
The underlined terms are defined in the appendix. 



1) How many web pages does your application have? 

2) How many web pages did you develop from scratch? 

3) How many web pages did you reuse from other sites? 

4) If you reused a web page, did you do this by (tick more than one if necessary): 

making a link to it from your application ( ) 
making a local copy of that page and using it ( ) 

5) When reusing a web page did you (tick more than one if necessary): 

add new links to it ( ) 
not add new links to it ( ) 
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6) Approximately how many links did you create? 

7 ) What is the compactness of the application? 

Completely 5 4 3 2 1 

Completely 
Connected 

Disconnected 

8) What is the stratum of the application? 

High stratum 5 4 3 2 1 No stratum 

9 ) Please circle the structure which best describes how the core documents of the application 
are organised: 

Sequence Hierarchy Network 

10) For each of the following tasks, please estimate the time it took to: 

Create all: 

a) text contents 

b) graphics contents 

c) video contents 

d) audio contents 

Create all: 

e) links 

Structure the application 

Design the Interface: 

h) planning of the interface 

i) implementing of the interface 

Test the application: 

j) testing the links in the application 

k) testing media, i.e., testing if 

the video works, the sound 

works, etc. 



11 ) What tools did you use to develop the application and for what purpose? Tick more than 
one if appropriate: 



Purpose 



( ) An HTML editor 

( ) An application generator 

( ) Software to organise and 

manage the HTML files 

( ) if other, which one? 

QUESTIONNAIRE APPENDIX 

Compactness - the connectedness of the application. A high compactness indicates that from 
any document a user can easily reach, using links, any other document in the application, 
suggesting a large amount of cross-referencing. 
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Hierarchy - a structure of an application where the documents are organised as a tree (see 
figure 1). 




Figure 1 - A Hierarchy 



Network - application where its structure is arranged as a net of documents and links (see 
figure 2). 




Figure 2 - A Weh 

Sequence - a structure of an application where the documents are linearly linked (see figure 3). 

Figure 3 - A sequence 

Stratum - suggests if there is an order for reading the application. Minimum stratum means 
that structurally it makes no difference from what document one starts reading. Maximum 
stratum is achieved in a linear application (the topology is a sequence ). 
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Abstract. Usability testing is normally used to determine the navigation 
map that better adapts to the average-user mental navigation model. 
However, usability testing is an expensive process, misses the influence 
that the computer environment has on navigation, and it is not able to 
record spontaneous user behaviour. The remote testing technique is an 
interesting and cheaper alternative that avoids the problems 
commented. We have developed our own set of tools based on data 
gathering agents for supporting remote testing. In this paper, we 
comment our experiences designing and developing such kind of 
systems and the results obtained when conducting human-computer 
interaction experiments based on access to web sites. 



1 Designing Navigation Maps for Hypermedia 

During the analysis stage of a hypermedia educational tool, designers determine the 
educational goals to be reached, that is, the minimum amount of information that the 
audience should obtain when using the tool. Relevant information is then compiled 
and organised into categories. This process will provide a first sketch of the 
knowledge structure. 

The following task (called design stage) takes care of simplifying the knowledge 
structure, creating small didactic units, following a divide and conquers strategy. This 
recursive process provides the final list of information nodes of the hypermedia 
product, where every node represents the smallest knowledge unit possible [1]. The 
resulting information nodes are then linked each other by hyperlinks, creating a 
directed graph structure generally called navigation map or navigation model [2], 

Sometimes, a single navigational map can not satisfy navigation, so more than one 
must be incorporated to the hypermedia product. Some hypermedia design methods 
such as OOHDM allows the design of different models from the same knowledge- 
base [3], This goal is reached by the design of different navigation maps, depending 
on the different kinds of final user. In the example included in the figure 1, the 
knowledge base (identified during the analysis stage) might be divided in two (or 
more) different nodes during the design stage. Although the knowledge base is the 
same for both nodes (novice and expert), the granularity of the information provided 
by every node is not the same. While novice users may be happy with general 
concepts, expert users normally require more detailed information. This is not a static 
model, as users are able to promote their status by mean of evaluation. 
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Fig. 1. Multiple views of an information node 



2 Testing Navigation Maps 

Whether in the World Wide Web or in a multimedia desktop-based application, users 
expect highly effective and easy-to-learn navigation mechanisms and metaphors. 
People need to determine what they can get in their environment and how to find their 
way around. Unfortunately navigation is not a simple task: people get lost and their 
searches get frustrated [4]. 

The main goal of the test stage will be to measure the quality of the navigation 
maps defined during the design stage in terms of navigation. Designers need to know 
which abstraction of the knowledge structure better adapts to the user needs. 

As navigation strategies are based on topological considerations of the knowledge 
structure [5], orientation in a navigation map involves aligning the user representation 
of the space (user's mental representation) with the designer's mental representation. 
The user's mental representation is based on the user's previous knowledge, 
experience and views of the navigation map structure. A close relationship between 
both mental representations is crucial for navigation because it will determine how the 
users move around the navigation maps and, even more important, it determines how 
users retrieve knowledge from the hypermedia product. 

Unfortunately, there is not a clear way to predict how users will move along the 
hypermedia product, as there are too many cognitive factors involved. Navigation is 
as process where decisions must be made continually, regarding strategies for 
reaching the goal and determining whether the goal has been reached. These decisions 
sometimes follow a plan and sometimes respond to the environment [5]. A slight 
change in the navigation structure may result in a completely different navigation 
strategy. As a result, testers must expose the navigation maps to real users. 

This exposure can be done by prototyping. Under this approach, the developing of 
preliminary versions of the navigation map is required to verify its workability [6]. 
Once the prototype is ready, it can be verified by usability testing, watching and 
listening carefully to users as they work with the prototype when they are encouraged 
to perform certain tasks. 
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3 Classic Usability Testing for Navigation 

The main goal of usability testing focused on navigation is to get as much information 
as possible about how the final user makes use of the set of the different navigation 
mechanisms designed. This information should be contrasted with what designers 
expected in order to measure the quality of the design. 

Usability testing is a really useful technique to improve the behaviour of the user 
interface of many devices and software applications. Estimating all the costs 
associated with usability engineering, a study performed by [7] showed that the 
benefits could be up to 5000 times the cost. Usability Testing is also a powerful 
technique for revealing hidden user behaviour, and for discovering internal user- 
generated metaphors when dealing with the user interface of some kind of 
machines [8]. 

However, applying classic usability testing to navigation has some disadvantages, 
which can make it a slow, expensive and inaccurate process: 

• Navigability testing requires high precision and permanent observation of the 
volunteers. 

• Testing navigation is not a simple task, so each volunteer needs at least one 
observer. 

• Long time of observation is also required. 

• Due to the high testing costs, samples are usually small, so testing quality drops 
down. 

• Testers usually choose volunteers among people with the same cultural background 
as the intended average user, so it is quite difficult to discover new kinds of not 
already identified users. 

• Volunteers know that they are being observed. This situation adds cognitive 
external factors to the testing process, including nervousness, confusion and 
timidity. 

• Knowledge about the experiment can influence navigability too because volunteers 
know or at least guess what is expected from them, adapting their own internal 
navigational metaphors to the navigational model of the prototype provided. 

• Usability testing is performed in computers that match the technical specifications 
needed to execute the hypermedia product. This may not be the case of the final 
user’s computer. Some technical factors, such as for example high delays in system 
responses, may completely change the user behaviour. Some authors [9], [10], 
[11], [12], [13] have reported important changes in user behaviour when retrieving 
data from busy web sites. 

• As usability testing takes place behind the walls of a laboratory, testers miss the 
role that the environment plays in the user behaviour. 

4 Remote Testing 

In order to avoid the disadvantages commented, we have proposed a technique of 
evaluation called Remote Testing [14]. By mean of this technique, testing is 
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performed under the same conditions under which the final product will be used, that 
is, testing the prototypes in the user-computing environment and without the presence 
of testers. 

Under this ideal situation, users ignore the nature of the experiment, even the role 
they are playing. Users fell free to explore the navigation maps provided, as they 
aren’t under pressure, so external factors such as nervousness, or confusion will not 
affect the results of a usability testing session. As the navigation testing is performed 
in the user computer environment, it is possible to get information about how such 
environment affects navigation. Changes in user behaviour due to delays in system 
responses may now be detected and avoided. 

In the classic usability testing approach, the user comes to the laboratory to test 
software. In the remote testing approach, the software visits the user’s home for being 
tested. With remote testing, once a prototype is ready, it must be freely distributed as 
a demo version, so it will be tested by everybody interested in the project, not only by 
the average user. This approach encourages detection of new kind of users. 

This technique facilitates experimentation, focusing testing in only one-task 
(navigation) using special tools able to support usability testing in situations which 
are not under the observer’s control. As the evaluation takes place in the own user 
computer environment, the new testing tools must be able to capture information 
about the user navigation session, and send it to the testers for a post hoc analysis. 
The observation process is independent of the hypermedia device location, as it may 
be a local process (classic usability testing) or a remote one (remote testing). 

Data collected during usability testing focused on navigation consists mainly of the 
landmarks, routes and surveys [5] employed by users. Landmarks represent 
conceptually and perceptually distinct information nodes. Route knowledge means the 
understanding of the environment described in terms of paths between information 
nodes. Finally, survey knowledge describes the relationships between information 
nodes and provides information about the user's mental representation of the 
navigation map. 

Depending on the data collected, it may be possible to detect the kind of navigation 
strategy employed by the user: a situated navigation or a plan-based navigation [5]. In 
the first strategy, the user navigates employing situation-specific knowledge, 
landmarks, and incomplete information. It is a reactive strategy that is employed 
when the goal looks achievable and/or close The second strategy employs survey 
knowledge to generate in advance a complete plan for achieving the goal. Individual 
differences impact which strategy is favoured, such as the navigator's expertise, 
domain knowledge or spatial abilities. 

In order to obtain the landmarks, routes, and surveys and to detect the kind of 
navigation strategy employed by volunteers, a remote testing tool should obtain at 
least the following information: 

• The set of information nodes visited during a single user session, including its 
order, that is, the path followed by users when retrieving the information. 

• Time of user arrival and departure to/from an information node. 

• The list of the most popular destination and links of a navigation map. 

• How long the user takes to obtain the information he or she was looking for inside 
the scope of every information node. 
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5 An Automatic Navigability Testing System 

In order to support remote testing, we have developed our own remote-testing tool, 
based on agents, called ANTS (Automatic Navigability Testing System). This tool 
was developed using Java, so it works fine in almost any current multimedia platform 
available, including Macintosh, Unix and Windows. ANTS is also supported by a 
healthy collection of web browsers (such as Microsoft Explorer or Netscape 
Communicator), so it can be used for testing navigation maps on any kind of 
hypermedia device available (either platform-dependent hypermedia applications or 
web sites located everywhere in Internet). 

The design of ANTS is based on the client-server paradigm following the design 
metaphor of the life of a community of ants. The ants (very small agents) go out from 
their anthill (a central server) looking for food (the information) in every picnic 
available (a single user navigation session). Once an ant gets the food, it comes back 
to the anthill and stores it carefully in one of the its warehouses (server's files or 
databases). 




Multimedia Multimedia 

Assets Assets 

User 




Fig. 2. The agents are downloaded from the ANTS server at the same time as any other 
multimedia asset 

When testing the set of navigation maps of a web site, the agents must be included 
in every web page to be tested, in the same way as other standard multimedia assets, 
such as sounds, text, pictures, etc. This task can be done manually by the web 
designer or automatically performed by a simple script. 

As the multithread nature of the anthill allows the concurrent testing process of 
several navigation maps, the information collected must be organised in categories, 
depending on the navigation map the ant is testing. To know where to store the 
information sent, the ant must inform the server about which navigation map it is 
actually monitoring. To perform this task, an identification code must be assigned to 
every navigation map and to every information node under testing. Both identifiers 
are unique and must be assigned during the design stage of the hypermedia project. 

The piece of HTML code below shows how to configure the ant's parameters for 
testing a web site. Parameter MAP represents the code assigned to the navigation map 
(a web site called Tirsus). Parameter NODE provides the code for the information 
node (a web page identified as Portal). The last two parameters are implementation- 
dependent and represent the server's host machine and the socket's communication 
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port. The AdvertApplet class shows a visual cue (an animated gif) of the ant it has 
inside, so the user knows that the weh page visited is being testing by ANTS. 

<applet 

code=" dobra -productos . ants . ant .AdvertApplet . class" 
codebase = "http://www.di.uniovi.es/~martin/BinJava" 
width=100 height=32> 

<param name="MAP" value="Tirsus" > 

<param name="NODE" value= " Portal " > 

<param name = "ANTHILL" value= "pinon . ecu . uniovi . es " > 
<param name="PORT" value=1932> 

</applet> 

When the user visits an information node, the multimedia assets included inside it are 
downloaded from their web servers. Our agents behave as common multimedia assets 
and are downloaded from the ANTS server (see figure 2). 

Once the ant has arrived to the user machine, it establishes a communication 
channel with the server (via sockets or Java’s Remote Method Invocation), which is 
used to send the data collected (see figure 3). 

An ant-assistant object located in the server-side maintains the communication 
channel alive during the entire navigation session. The anthill periodically verifies the 
state of every communication channel opened by the list of active ant-assistants 
objects. This practice prevents the existence of ant-assistant objects connected to dead 
ants. 



User 




Fig. 3. The ant agent (Grey star) in node B observes user behaviour and sends a report to the 
anthill establishing a bi-directional communication channel 

The ant-assistant objects collect the information sent by every ant participating in the 
navigation session, classifies it, and sends it to the warehouse object commissioned to 
store the data in the proper file of database entry. The information is sorted depending 
on a specific code assigned to every navigation session. This code depends on the 
kind of prototype being tested and is automatically generated by the ant itself. In the 
case of testing a web site, this code is made from a combination of the Internet 
Protocol Address (IP) and the user’s host machine. 



6 Data Gathered by the Agents 



The remote testing approach requires the distribution of the navigation map 
prototypes among the volunteers, so they can explore them using their own computer 
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equipment. This guarantees testing under the real user environment. In the case of 
testing web sites, testers must install the different navigation maps designed in the 
final web server. 

The information collected by the agent consists of a complete identification of the 
different landmarks represented by the information nodes along with the user's time of 
arrival and departure to/from the nodes. 

The ants are able to report exact amount time that the user spent in every node 
during his/her navigation session, even in situations where the user stops navigation to 
use a different application, a common practice detected by ANTS in multitask 
environments. In the case of agents running inside a web page, they are able to detect 
when the browser’s window is hidden by another application by mean of the applet's 
start and stop methods. 

If the security manager of the Java Virtual Machine (JVM) allows it, the ant is able 
to collect information about the user too, including the user name, the client's 
platform or the client's operative system version [15]. By mean of this information, 
testers can get valuable information about the technical specification of the user 
computer environment. 

Notice that users should be informed from the just beginning of the experiment that 
they are participating in a usability testing. The only fact the users should ignore is the 
observation technique that the researchers are going to use. This is a common practice 
in merchandising, where researchers observe customer behaviour through hidden 
video cameras [16]. This technique is similar to the employed in personality tests. 
Psychologists are interested mainly in the attitude of the volunteers as they solve the 
test, rather than the in the test’s results. Other usability software [17] follows a similar 
approach. 

The following report was obtained using ANTS when testing the navigation map of 
a web site coded as Tirsus. In the report, the user arrived at 11:25:12 Hrs. (server local 
time) from a computer located inside the scope of the ECT time zone. The code for 
the navigation session corresponds to the client's IP address (156.35.94.1) and the 
client’s host machine (petra.euitio.uniovi.es). 

OOTLAB HCI Research Group 
GADEA-ANTS II Ver. 2.01 1999-2000 

DATE: February 10 2000 (Wednesday) 

TIME: 11H25' GMT+00:00 

IP ADDRESS: 156.35.94.1 (petra.euitio.uniovi.es) 

TIME ZONE: ECT (-2) 

LOGS: (Server local time) 

11:25:12 - 11:31:25 [00:06:13] -> User visits <Portal> 

3 seconds out . 

11:31:28 - 11:33:31 [00:02:03] -> User visits <History> 

2:23 seconds out [user does not explore this site] 

11:35:54 - 11:36:12 [00:00:18] -> User visits <History> 

5 seconds out 

11:36:17 - 11:42:22 [00:06:05] -> User visits <Roman 

Walls> 

2 seconds out . 
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11:42:24 - 11:44:28 [00:02:04] -> User visits <Roman Army 

> 

11:44:28 -> Log out. 

Nodes visited: 5. 

Different nodes visited: 4. 

Total Time: 00:10:41 seconds. AVG: 00:04:49 seconds. 

Trans Time: 00:02:33 seconds. AVG: 00:00:38 seconds. 

Real Time: 00:16:43 seconds. AVG: 00:04:11 seconds. 

User navigation behaviour and the time expended in every node are also registered. 
For example, the user arrived to the page Portal at 1 1:25:02 and left it at 1 1:31:25, so 
he or she stayed there for 6 minutes and 13 seconds. Then, the user moved to the page 
History, arriving there at 11:31:28 (so the ant associated to the node History spent 3 
seconds to be downloaded from the anthill and to establish a connection with the ant- 
assistant). After having the page visible for 2 minutes and 3 seconds, the user leaves 
navigation on the site (opens a new browser window, hides the browser, etc.) for 2 
minutes and 23 seconds. Then, the user resumes his/her navigation session visiting 
other pages of the web site. 

At the end of the report, there are some statistics about the time spent by the user in 
the site. In this example, the user visited 4 different pages, spending an average time 
of 4 minutes and 1 1 seconds in each one. 

7 Testing Navigation with ANTS 

In order to measure the usefulness of the remote testing technique proposed and to 
test the current implementation of ANTS, we tried the tool conducting a human- 
computer interaction experiment. The goal of this experiment -which is described in 
detail in [18]- was to determine the influence that the distribution of visual objects in 
the horizontal line has on navigation, trying to determine the most suitable location 
for a navigation-bar. 

The web site designed for this experiment is quite simple. It consists of an entry 
node with a central information panel and two navigation-bars (frames) which are 
identical (see figure 4). 

One of the bars was left-hand located and the other one was right-hand located. 
Both navigational-bars were provided with only one link. If a visitor wants to reach 
the information used as the lure for this experiment, he or she has to select one of the 
lateral frames Obviously, there was no mention about which link conducts to the 
information desired. Our intention was to oblige users to explore carefully both 
navigation-bars, so both were designed using a low colour contrast. This design is 
intended to make reading difficult, so the user's attention requires an additional effort. 

As both navigation-bars are identical, a user who arrives to the web site faces the 
problem of choosing between two identical options [19]. Notice that the action of 
selecting a link depends directly on human cognition and implies ergonomics 
considerations too. Choosing a link is not a cost-free action. It needs some time to 
take the decision (determined by the Hick's Law, [20]) which is followed by a 
mechanical movement to perform the click. The movement of the mouse pointer from 
the original place to the link location needs some time too (determined by the Fitt's 
Law, [20]). 
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Fig. 4. The weh entry node described in the HCI experiment 

From the reports obtained with ANTS, we computed the reaction time parameter (RT) 
which represents the time spent by the user in the entry node before clicking in one of 
the links provided. Depending on the amount of time spent by users before making 
the decision, it should be possible to detect the user observation strategy (analytic vs. 
naive, look vs. bet, etc. [21]). Users, who spend long time before choosing a link, 
might follow an analytic strategy while users who quickly selected a link might 
follow a naive strategy. When measuring RT we are trying to detect any possible 
relationship between the link selected and the observation strategy followed. 

After three weeks of data gathering, the number of valid navigation sessions 
reported by ANTS was 342. The results -once the data collected was processed- can 
be seen in table 1. The results clearly show that, although the left-hand location was 
selected in a higher number of navigation sessions, the difference between the number 
of choices for both locations is not relevant. Both locations seem to have the same 
level of popularity. 

We would like to remark however that when a web designer decides to place a 
navigation bar in either the left-hand or right-hand location, he or she is designing the 
web site for 50% of the visitors only. This problem could be solved thanks to adaptive 
hypermedia, selecting a location for the navigation bar depending on the user’s 
preferences. 

As the user reaction time average for both locations was almost the same (-23.94 
seconds), we failed in our attempt to recognise any special observation strategy 
depending on the link selected. However, we might venture to say that horizontal 
location doesn’t seem to affect user performance in a decisive way when navigating 
through a hypertext device. 
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Table 1. Results of the experiment once the data has been collected and processed 



1 GLOBLAL RESULTS 


Universe: 342 valid sessions. 


Left hand choices: 179 (52.5%). 


Global Average for the Reaction 


Right hand choices: 160 (47.5%). 


Time parameter: 23.94 seconds. 





LEFT HAND LOCATION 



Chosen: 179 times. (52.5%) 

Average for the Reaction Time 
parameter: 23.83 seconds. 



RIGHT HAND LOCATION 



Chosen: 160 times. (47.5%) 

Average for the Reaction Time 
parameter: 24.08 seconds. 



8 ANTS as a Help System for Navigation 

Although the main goal of a tool such ANTS is focused on retrieving information 
from the navigation sessions, the open design of ANTS allows its use as powerful 
source of information too. Data collected by ANTS may be employed by other 
systems to provide useful information to users when they explore a web site. 

During a navigation session, the anthill knows the exact position of the user in the 
navigation map. Help components (such as banners) may use this information to 
provide information about the user current position in the navigation map as well as 
the set of web pages that can reached from tat position. With this information, the user 
may be able to change the navigation strategy employed. Depending on the quality of 
the information provided by the help component, the user might abandon the initial 
situation-specific strategy for a more efficient plan-based navigation scheme. 

This helping feature follows the design metaphor of a GPS device (Global 
Positioning System). Current version of ANTS implements GPS objects that can be 
used by other Java objects to help users in their navigation sessions. When the user 
moves to a new page, the event is notified to the anthill by the ant located in that 
page. This event is then fired by the anthill to the set of GPS objects subscribed as 
listeners to this kind of events, so they get valuable information about the current 
location of the user in the navigation map. 

Once the GPS has received a notification of the event, it can inform to every 
associated help-component about the new position of the user inside the navigation 
map. The help-component can change the information displayed, providing specific 
information about the contents of the node that the user is visiting. 

We have used this feature for the design of the web pages of the Archaeological 
Museum of Asturias (see figure 5). We have included an information banner as part of 
the visual space of the site. The banner is located at the bottom of the browser's visual 
space and it is always visible (it was included as a separate frame). The banner 
includes a GPS object (connected to the anthill) which informs the banner about the 
position of the user in the navigation map, so the information displayed by the banner 
varies depending on where the user is. 
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Fig. 5. Visual space of the weh site of the Archaeological Museum of Asturies 



9 Conclusions 

Remote testing clearly increases the quality and productivity of the testing stage of a 
hypermedia development process. This approach eliminates irrelevant noise (external 
factors such as nervousness or confusion), captures additional information (influence 
of the environment on navigation), gets accurate and reliable information 
automatically and makes testing cheaper a more efficient. 

When using automatic remote testing tools such ANTS, data flows freely from the 
origin to the server storage system, where it can be analysed off-line. As navigation 
takes place in the own user-computing environment, there is no need to assign 
expensive laboratory resources for testing. Testers are free from the boring task of 
capturing data, so they can focus their efforts in analysing the results obtained and 
improving the quality of the navigation maps. As there is no need to assign human 
resources for conducting user observation, the whole process of usability testing is 
cheaper. 

By mean of this technique, we can record more navigation sessions with the same 
human resources. It is now possible to increase the number of volunteers participating 
in navigability testing, so the testing quality increases accordingly. In the experiment 
commented in this paper, we have analysed a total number of 342 navigation. It 
would have been completely impossible for us to conduct such high number of 
usability testing sessions with the limited human and hardware resources of our 
laboratory. However, with remote testing, we were able to cope with the experiment 
commented with the unique help of a quite old and obsolete PC (the anthill was 
installed in an old i486PC with 16Mb of RAM, running Linux and Java 1.1.6). 

We want to remark however, that remote testing does not pretend to be a 
substitution of classic usability testing. Our intention is to optimise usability testing 
for certain special tasks. Remote testing is able to detect actions but it is unable to 
observe how those actions are performed. For example, with remote testing it is quite 
easy to get the number of clicks on a button during a navigation session, but it is 
impossible to know the difficulties that the user could have when he/she performed 
the click. That is a task where classic usability testing seems to be unbeatable. 

Remote testing works fine when the different features to be tested can be 
measured, and the tasks to be performed by the volunteers are simple and easy, but it 
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can not be applied to complex tasks or to measure subjective factors. Classic usability 
testing should be used instead. 
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Web Maintenance and Reuse 



1 Overview 

The importance of maintenance of Web sites and applications has already been 
widely recognised. From a logical point of view and from the understanding derived 
from developments in Object Oriented methods, it is also clear that reuse of what 
exists will always help in reducing the costs and the time for developing and 
maintaining Web-based systems. The two papers in this section report on ways to 
help Web maintenance and reuse. 

The first paper, Improving Web-Site Maintenance With TANGOW by Making Page 
Structure and Contents Independent, discusses how Web site maintenance can be 
improved by making page structure and contents independent. The paper proposes a 
method, called TANGOW (Task-based Adaptive learNer Guidance On the Web) for 
designing and maintaining Web-sites. In TANGOW, the information structure and 
contents are managed independently, which facilitates the maintenance tasks. It also 
allows the specification of a common design pattern for all the final HTML pages, 
which the system dynamically generates starting from static HTML pages. A real 
world example shows how TANGOW helps to solve the identified problems. 

The second paper, Web Design Frameworks: An Approach to Improve Reuse in 
Web Applications, discusses approaches to improve reuse in Web applications. Web 
design framework is a conceptual approach to maximize reuse in Web applications. 
The paper discusses the need for building abstract and reusable navigational design 
structures, exemplifying with different kinds of Web Information Systems. A review 
of the state of the art of object-oriented application frameworks is followed by the 
rationale for a slightly different approach focusing on design reuse instead of code 
reuse. The paper proposes OOHDM-frame, a syntax for defining the hot-spots of 
generic Web application designs whose applicability is illustrated by a case study in 
the field of electronic commerce and discussion of how to implement Web design 
frameworks in different kind of Web platforms. 
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Abstract. In this paper we discuss how Web site maintenance can be 
improved by making page structure and contents independent. This is 
the principle used in TANGOW (Task-based Adaptive learNer 
Guidance On the Web) for designing and maintaining Web-sites. In 
TANGOW, the information structure and contents are managed 
independently, what facilitates the maintenance tasks. It also allows the 
specification of a common design pattern for all the final HTML pages, 
which the system dynamically generates starting from static HTML 
pieces. In the first section we introduce some common issues and 
problems in Web-site development. The TANGOW system is presented 
in the second section, focusing on those characteristics that make it 
useful for the development of Web-sites. Based on a real world 
example, the next section shows how TANGOW helps to solve the 
problems mentioned in section 1. Linally, some conclusions are taken 
from the described application 



1 The Problem to Be Solved 

In this paper we discuss how the maintenance of Web sites can be improved by 
making page structure and contents independent. This is the principle used in the 
TANGOW system to design and implement Web sites. This system has been used so 
far to develop adaptive Internet-based courses [1], but as it will be shown along the 
paper, it has features that makes it very appropriate to be used for the above- 
mentioned purpose. 

A Web site with information about TANGOW itself will be used to prove this 
statement. The site was initially developed in the classical manner with static HTML 
hyperlinks and contains information such as a short description of the system, 
documentation, related publications and system team, both in English and in Spanish. 
It can be accessed at http://www.ii.uam.es/esp/investigacion/tangow/present.html 

The structure of the referred Web site is shown in figure 1 . It represents two page 
trees, each of them corresponding to a different language. Pages, nodes in the tree, are 
organized in four different levels. Lor example, node nO is the site’s home page, 
which contains four links to nodes na, nb, nc and nd corresponding to the above- 
mentioned categories: description, documentation, publications and team. Among 
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these four nodes only nb (documentation) contains hyperlinks to nodes in a lower 
level of the page tree. 




Fig. 1. The starting Web site page trees 

Each node in the tree is divided into three sections: header, contents and 
hyperlinks. Headers are named following the same scheme: 

- a letter (‘E’ or ‘S’) indicating the language, 

- the letter ‘H’ followed by a digit that indicates the level, 

- a series of digits and letters that refer to the page in the previous level to which 
they are linked, 

- and a final digit which establishes an order among the descendants of a given node. 
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A similar scheme is used for naming the contents, but omitting the level digit. 

This Web-site has been designed taking into account that the information should be 
available in different languages and that pages should have a common appearance. 
From this starting point, other factors related to the design, maintenance and use of 
the Web-site should also be considered. For instance, when the same information 
appears in different pages, it should not be necessary to duplicate this information in 
every page. In addition, a procedure should be provided to maintain and update the 
information contained in the web pages in an easy and efficient way. This procedure 
should guarantee that the information provided is consistent. It would also be 
desirable to present users with different information according to their profiles. 

The next section describes the TANGOW system, a tool that facilitates the 
development of Web sites taking into account the desirable characteristics mentioned 
above. 



2 The TANGOW System 

TANGOW (Task-based Adaptive learNer Guidance On the Web) is a tool for 
developing Internet-based courses, accessible through any standard WWW browser. 
The contents of these courses are structured by means of teaching tasks and rules. 
They are stored in a database and allow TANGOW to guide students according to 
their profile. Relevant profile features are previously established by the course 
designer. In this guiding process, the actions the students perform while following the 
course are also taken into account. 

A teaching task is the basic unit in which contents are structured. Each task, which 
has a specific name, is defined by giving values to some slots. For our purposes, the 
most relevant ones are description(s), associated contents and atomicity. The 
description slot contains a text that briefly describes the corresponding concept. A 
description slot exists for every considered language. 

The associated contents slot is a list of media elements (text, images, 
videos, applets, sounds, animations,...) which will be used to dynamically generate the 
web pages shown to the user. The multimedia elements are stored in different folders 
depending on the selected profile features. When a media element is the same 
regardless these features, it is stored in a folder referred to as the “default folder”. 

With respect to atomicity, teaching tasks can be atomic or composed. If a teaching 
task is composed, a rule needs to be created to specify how the task is decomposed 
into subtasks. A task can be decomposed in different ways (i.e., there can be several 
rules associated to a task) each of which will have a different activation condition. 
These activation conditions will depend on information about the student actions, the 
student’s profile and the learning strategy in use. 

Another decision designers must take is related to the order in which subtasks 
appearing in the right-hand side of a rule may be performed by the student or user 
accessing the Web site. Different sequencing modes are available in the system. AND 
sequencing can be chosen if the subtasks must be performed following a fixed order. 
ANY sequencing indicates that they can be performed in any order. Finally, if 
OR/XOR sequencing is chosen, it is enough if the user performs some of them 
(exactly one, in the XOR case) to consider the main task as achieved. 
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Sequencing modes are used by the system to guide students. As an example, if the 
AND sequencing is specified for a given list of subtasks the first one will be presented 
to the student. Only when this is finished, will the next one be offered. In any case, 
the “back” and “forward” navigator options are always available for the student who 
can follow his/her own learning itinerary. 

In TANGOW the learning process is established to be equivalent to achieving a 
task which is considered to be the main one among all those that have been defined by 
the designer. This means that once all the required tasks and rules have been created, 
the designer just have to choose one as the main task. 

At execution time, all tasks initiated (and that may be finished) by the student are 
stored in what is referred to as a dynamic task tree. There, information about the 
actions performed by the student when interacting with the course is stored. This 
information includes the time the student has spent performing each specific task, the 
number of visited pages, the number of exercises done and the number of these which 
have been correctly solved. These data is used by TANGOW to adapt the course 
contents to the student's progress. As an optional facility, the dynamic task tree can be 
stored at the end of a session and restored for subsequent student sessions. 

The TANGOW system generates HTML pages dynamically depending on 
parameters that are either associated to the active task or related to the user profile and 
actions. The concrete media elements that will appear in the generated page are 
searched in the corresponding folder, according to the user profile. If the media is not 
found there, it is taken from the default folder. 

Up to now TANGOW had been used for educational purposes, but there are many 
reasons which make it feasible to extend its application domain so that it can be used 
to develop and maintain all the pages that constitute a Web site. These reasons will be 
detailed in the following section. 



3 How TANGOW Can Help 

In this section we will show the advantages of using TANGOW for designing and 
implementing a Web site about the TANGOW system. The new site should have the 
same contents as the one described in section 1 . 

In general, when designing a Web site with TANGOW it is necessary to choose a 
common appearance pattern and to design the required tasks and rules. In our case, 
the common pattern includes a background color and a heading. This heading is the 
name of the system and corresponds to node O’s heading in figure 1. 



3.1 Designing Tasks and Rnles 

Starting from the structure in figure 1, a set of tasks and rules must be designed. In 
general, each page corresponds to a task. The name of the task is derived from the 
node’s name. A task will be tagged as atomic if it corresponds to a leaf in the tree. 
Otherwise, it will be tagged as composed. 




Improving Web-Site Maintenance with TANGOW 329 



Table 1. Details of some defined tasks 



Task Slots 



Task 

Name 



Task Description 



English: Spanish 



Task 

Content] 

Names 



Media Folder Contents 



English: 



Spanish: 



TO 



<-> 



<-> 



CO 



ECO 



SCO 



Ta 



<Ea> 

Short system 
description 



<Sa> 

Breve 

descripcion del 
sistema 



Hla 

Ca 



EHla 



ECa 



SHla 



SCa 



Tb 



<Eb> 

Documentation 



<Sb> 

Documentacion 



Hlb 

Cb 



EHlb 



ECb 



SHlb 



SCb 



Tc 



<Ec> 

Publications 



<Sc> 

Publicaciones 



Hlc 

Cc 



EHlc 



SHlc 



Cc 



Td 



<Ed> 

Team 



<Sd> 

Equipo 



Hid 

Cd 



EHld 



ECd 



SHld 



SCd 



Tbl 



<Ebl> 

Architecture 



<Sbl> 

Arquitectura 



H2bl 

Cbl 



EHlb-EH2bl 



SHlb-SH2bl 



ECbl 



SCbl 



Tbll 



<Ebll> 

Programs 



<Sbll> 

Programas 



H3bll 

Cbll 



EHlb-EHZbl: EH3bll SHlb-SHZbl: SH3bll 



ECbll 



SCbll 



The value of the description field is taken from the corresponding hyperlink in the 
parent task. The associated content field will have two parts: a heading and the rest of 
the page contents. The heading will he constructed by concatenating the page’s 
headings that appear in the path starting at level 1 of the tree. Different separators 

are used to join the different elements in the heading. Note that both the 
description field and the heading are not required for the main task of the course. 
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Figure 2 shows all the tasks corresponding to levels 0 and 1 in page trees and a 
sample task for levels 2 and 3. 

In addition, a rule must be specified for each composed task. For each rule, the 
LHS of the rule is the composed task T, while the RHS of the rule are the tasks 
corresponding to T’s children pages in the page tree. OR is chosen as the default 
value for the sequencing attribute in all rules. Figure 3 shows the rules required for 
the page tree in figure 1 . 

Finally, two comments about this new Web site are worth mentioning. First, it 
should be mentioned that some hyperlinks considered not being essential, such as 
those to the department and university’s home pages, have been omitted. Second, as 
the access to the Web site will not be for educational purposes there is no need of 
storing any information about the “user’s progress”. For this reason, this facility is 
deactivated. 



Table 2. Rule description 



LHS Task 


RHS Tasks 


Sequencing 


Name 


English 

Description 


Name 


English Description 






Ta 


Short system description 




TO 




Tb 


Documentation 


OR 




“ 


Tc 


Publications 








Td 


Team 








Tbl 


Architecture 




Tb 


Documentation 


Tb2 


Design 


OR 






Tb3 


Why do we call it adaptive? 




Tbl 


Architecture 


Tbll 


Programs 


OR 






Tbl2 


Data 








Tb21 


Teaching tasks 




Tb2 


Design 


Tb22 


Rules 


OR 






Tb23 


Examples of teaching tasks and rules 








Tb31 


Student profile 




Tb3 


Why do we call 


Tb32 


Teaching strategy 


OR 




it adaptive? 


Tb33 


Student actions 








Tb34 


Course design 





The resulting Web site can be accessed at http://eneas.ii.uam.es/html/courses.html 
by selecting “The TANGOW system” course. One of the Web site pages 
(corresponding to node b3 in figure 1) is shown in figure 2. 
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COMMON 
. PATTERN 

(includes background) 



HEADER 

(EH2b3) 



PAGE CONTENTS 

(ECb3) 



DYNAMIC LINKS/ 



NAVIGATION 



PROGRESS BAR 



Fig. 2. A Web site’s page 



3.2 Advantages of Using TANGOW 

Once the design for the new site has been presented, the main advantages of using 
TANGOW to implement a Web site will be discussed. In order to do that the 
fulfillment of the requirements listed in section 1 is analyzed. These requirements are 
multilinguality, common design, common contents, maintainability and adaptivity. 

As for multilinguality, one of the main advantages of using TANGOW for 
designing and implementing Web sites is that it helps to save work if the same 
information has to be accessible in different languages. This is so because the same 
tasks and rules can be used for all the languages. The designer just has to provide the 
task description in the various languages and place the contents in the corresponding 
folders. The appropriate description and contents will be chosen by the system at 
runtime to build the page that will be presented to the user. 

As mentioned in section 1 , one of our requirements was that all pages should have 
a common structure. This is very easy to implement in TANGOW considering that 
pages are generated dynamically. Consequently, the Web site designer can specify 
general aspects about the page layout such as the background color, fonts, etc. These 
specifications will be followed by the system every time a page is generated. If no 
layout features are specified, a default pattern will be used. The difference with style 
sheets is that in the common structure not only format is specified, but also specific 
text or figures that should appear in all pages. This could be solved with a template, 
but if a change wants to be made to the template all pages already generated must be 
updated manually. 
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Another desirable feature of a system intended to help designing and implementing 
Web-sites is that when different pages have contents in common with small 
differences, it should not be necessary to duplicate this information in every page. In 
our example, this happens with the list of publications about the system being 
described, which is the same regardless the user’s language. The only difference 
between the list of publications in, let’s say, Spanish and English are the 
corresponding headers. As this is the case, it is enough to write this list once and place 
the HTML piece in the default folder. 

A procedure should be provided to maintain and update the information contained 
in the web pages in an easy and efficient way. In our case and directly related to the 
previous point, the fact that information is not duplicated facilitates the maintenance 
procedures since just a single HTML piece must be updated. Moreover, given that 
hyperlinks are automatically generated, link consistency is guaranteed if any change 
is made to the Web site structure. 

Although in the example presented adaptivity is implemented only in terms of the 
user’s language, additional user profile features can be taken into account. For 
example, in an educational Web site teachers, students and researchers could be 
presented with different information. In principle, researchers interests would relate to 
publications and projects, while teachers and students would need to have access to 
timetables, marks, and so on. In the case of an enterprise Web site, profiles could be 
used to differentiate between visitors and employees. 



4 Related Work 

In [2], some general issues in adaptive web-site design are included. The most 
important ones are: the separation of a conceptual representation of an application 
domain from the content of the Web- site, the separation of content from adaptation 
issues, the structure and granularity of user models, the role of a user and application 
context and the communication between different adaptive Web-site "engines". Some 
of these issues have already been taken into account in all the above. 

An existing system that can be used to implement Web-sites is Hyper-G [3], a 
'second generation' network information system that provides automatic structuring 
and maintenance of link consistency, thanks to the use of Hyper-Gs aggregate 
structures. It also supports multilinguality by allowing multi-language versions of 
documents. A deficiency of the system is that a different client per operating system is 
required: Harmony is the native Hyper-G client for X Windows on Unix platforms, 
while Amadeus is the client for MS-Windows. 

There are some other systems for automatic hypertext generation, as the ones based 
on the “lexical chaining” technique [4]. In this system, sequences of related words are 
discovered in a text or even in different texts, and links are automatically generated 
between those documents or parts of documents that are semantically similar. In this 
case, no common structure is available, and it is not possible to reuse pieces of 
hypertext in different pages, due to the fact that the documents are previously 
generated and only links are dynamically created. Moreover, there is no structure 
behind the documents. That is why it is very difficult to maintain a Web site based on 
this kind of link generation. 

An effective way of allowing the maintenance of Web sites and the reusability of 
its elements is by producing modules that may be reused in different parts of the same 
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site. These modules may be documents or pieces of documents related to each other 
by different types of relationships [5], and will compose the HTML pages presented 
to the users. 

Furthermore, there are several approaches that facilitate the reuse of multimedia 
elements in Web sites or, more specifically, in Web courses. For example, in [6] 
multimedia elements are classified and associated to course contents by means of 
indices. When the HTML pages associated to a concept must be generated, the system 
consults the indices associated to the concept and selects the related elements. 

It is also important to take into account other two problems that usually arise when 
a user is navigating through a Web site. These are disorientation and cognitive 
overhead. There are different possible solutions related to site maps [7] that must be 
considered when developing a new Web site. 



5 Conclusions 

This paper shows how TANGOW, a system originally intended for educational 
purposes, can be used to design and implement a Web site. One of the main 
advantages of using TANGOW for designing and implementing Web sites is that it 
helps to save work when the same information has to be accessible in different 
languages. 

The power of TANGOW is due to the clear separation between page structure and 
page contents. This mechanism allows the designer to specify a common design 
pattern, which is followed when generating the HTML pages presented to the user. As 
shown in the paper, this mechanism is more powerful than the use of style sheets or 
templates. 

Since contents are stored separately and organized in a coherent manner, they can 
be shared between different pages. This facilitates page’s maintenance and update 
while guaranteeing information consistency. In addition, page contents are adapted to 
the user according to his/her profile. Although in the example presented only the 
user’s language has been considered, additional user profile features can be taken into 
account. 
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Abstract. In this paper we introduce Web design frameworks as a 
conceptual approach to maximize reuse in Web applications. We first 
discuss the need for building abstract and reusable navigational design 
structures, exemplifying with different kinds of Web Information 
Systems. Then, we briefly review the state of the art of object-oriented 
application frameworks and present the rationale for a slightly different 
approach focusing on design reuse instead of code reuse. Next, we 
present OOHDM-frame, a syntax for defining the hot-spots of generic 
Web application designs. We illustrate the use of OOHDM-frame with 
a case study in the field of electronic commerce. We finally discuss 
how to implement Web design frameworks in different kind of Web 
platforms. 



1 Introduction 

Building complex Web applications such as e-commerce applications is a time 
consuming task. We must carefully design their navigational architecture and user 
interface if we want them to be usable. We must understand the user tasks while he is 
navigating the hyperspace to decide which navigation facilities we should include; for 
example we may consider defining indexes, guided tours, landmarks, etc. according 
the user needs. The interface should help the user browse through the sea of 
information by giving him cues and feed-back on his actions, and by presenting the 
information in a clear and meaningful way. Moreover, this kind of application also 
includes complex behaviors, as they not only deal with buying or selling, but they are 
also integrated with the company’s internal business; often providing different views 
of corporate databases, and acting as integrators of other applications. Another 
dimension in which these applications are different from what we may call 
„conventional“ software is the need to reduce deployment and delivery times. 
Applications in the Web must be built quickly and with zero defects. We must 
improve not only development but also debugging and testing times. 

To make matters worse, building applications in the Web involves using a myriad 
of different technologies such as mark-up languages (like HTML or XML), scripting 
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languages (JavaScript, Pearl), general purpose object-oriented languages (Java), 
relational databases, etc. We should find ways to improve the process of building this 
kind of applications by systematically reusing both application code and design 
structures. 

We have been designing Web applications using the Object Oriented Hypermedia 
Design Method (OOHDM) for some years [16, 15]. OOHDM considers Web 
applications as navigational views over an object model and provides some basic 
constructs for navigation (contexts, indexes, etc) and for user interface design. Using 
OOHDM we can apply well-known object-oriented software engineering practices to 
the construction of applications involving navigation. In the context of OOHDM we 
have been looking at ways to maximize reuse in the development process, since we 
have observed a certain degree of commonality among solutions in similar application 
domains. For example, most online stores have similar navigation structures, and they 
provide similar functions to their users. 

In this context we have found many recurrent patterns in Web applications; we 
have recorded them using a mixture of the GOF [2] and Alexandrian [Alexander??] 
styles. (See for example [11, 12]). We have found that micro-architectural reuse in 
Web applications is really feasible. However, if we want to move to architectural or 
design reuse, we need other concepts and tools in order to reason in terms of 
compositions of abstract and concrete Web application elements. 

In this paper we introduce Web design frameworks as a novel concept to push 
design reuse in Web applications. We first review object-oriented frameworks and 
compare them with Web design frameworks. We next present OOHDM-Frame, a 
notation to specify Web design frameworks, and show an example in the field of 
electronic commerce. Then, we show how to map Web design frameworks to Web 
application frameworks and to Web applications, and present some ongoing research 
issues in this area. 



2 Towards Web Design Frameworks 

There are different ways to achieve reuse in the context of Web applications. We can 
for example reuse interface templates in the form of HTML or XML descriptions. We 
can reuse information accessing shared databases [Garzotto96]. We can go further 
and reuse components that exhibit some non-trivial behavior. For example we could 
reuse code implementing shopping baskets in different e-commerce applications. 
Even though many of the supporting technologies may not support reuse (e.g. there is 
no inheritance or polymorphism mechanisms in HTML or XML, the code for 
shopping baskets might not be found in a single component, etc.) it seems that we can 
not go far beyond these examples. 

As a consequence the most important kind of reuse, design reuse, has been largely 
unexplored in Web applications, perhaps due to the non object-oriented nature of the 
Web. In a previous paper [12] we have introduced navigation patterns as a way to 
record, convey and reuse design experience. Though the kind of reuse provided by 
patterns is valuable, complex corporate applications need a way to maximize reuse of 
larger design structures. For example, the set of activities triggered when the user of 
an electronic store orders an item is usually similar in different stores. We should be 
able to express these commonalties in such a way that only the specific aspects of a 
particular store should be designed or programmed. 
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In the following sections we introduce Web design frameworks as a solution to this 
problem. We first review the state of the art in object-oriented frameworks and then 
highlight the differences with Web design frameworks. 

2.1 Object-Oriented Frameworks 

Object-Oriented application frameworks are the state-of-the art solution for building 
high quality applications in a particular domain, by systematically reusing an abstract 
design for that domain [1,7]. An object-oriented (OO) application framework is a 
reusable design built from a set of abstract and concrete classes, and a model of object 
collaborations. A framework provides a set of classes that, when instantiated, work 
together to accomplish certain tasks in the intended domain. An application 
framework is thus the skeleton of a set of applications that can be customized by an 
application developer. 

When many different applications must be constructed in the same domain, 
application frameworks provide "templates" for supporting their commonalties, and 
accommodating individual variations (differences). These "templates" usually have 
the form of abstract classes that must be sub-classified with concrete ones, or filled 
with "hook" methods that must be implemented by the application’s designer [10]. 
The framework's designer must understand the domain and be able to decouple the 
concrete model of a particular application from the abstract model of the whole 
domain. New applications can be built by simply plugging together framework and 
specific application components. Application frameworks have been built in areas 
such as user interface design, graphical editors, networks, financial applications, etc 
[1]. 

Let us suppose that we are building a framework for managing orders and delivery 
of products in different (non-electronic) stores. The framework will contain some 
abstract classes like Product, Client, Provider, Order, Invoice, etc. Their behavior will 
implement the usual flow of control in the store: when a client places an order for a 
product, a message is sent to the supplier, an invoice is generated, etc. 

For a particular application (store) in this domain, one will need to either 
instantiate these classes or sub-classify them in order to accommodate both their 
structure and behavior to the particular features of this store, e.g. different kinds of 
products, various payment policies, etc. This is usually achieved (in the framework) 
by programming generic methods in abstract classes that are then used (in the specific 
application) as templates in concrete sub-classes. 

This simple example helps to understand the problems with framework technology 
if we want to move to the Web environment - the need to adapt to an hybrid 
environment (object-oriented frameworks are usually programmed in a single 
programming language). In addition, Web applications involve another component, 
their navigational structure [12], since we are interested not only in the behavior of 
domain classes but also in the ways the user will navigate through them. 



2.2 Why Web Design Frameworks 

Web environments are not fully object-oriented. In the WWW we will have to define 
HTML pages, scripts in some language (such as JavaScript or Perl), queries to a 
relational database, etc. Conceptual and Navigation objects may have to be mapped 
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onto a relational store, and behaviors defined during design may have to be 
programmed by mixing a scripting language, stored procedures, and so on. The main 
consequence of this fact is that „conventional“ object-oriented application framework 
technology may still be inadequate in this domain, since we cannot suppose a single 
language environment as most frameworks do. Though there is a growing trend in 
„object-orienting“ the WWW [IEEE99], we still need heuristics to perform these 
translations. 

It may happen that we can program the full application using an object-oriented 
language (e.g. Java). Even in this case we will still lack an important part of the 
application’s functionality: its navigational behavior. We have argued elsewhere [16, 
14] that Web application models require both an object (conceptual) model where we 
specify usual behavior, and a navigational model in which we define navigational 
components such as nodes, links, contexts, paths, etc. Eor Web applications to be 
successful, the navigational structure must be carefully defined, and current object- 
oriented approaches do not provide primitives for navigation design. As a 
consequence, framework technology is not completely adequate for this domain. 

In the following sections we introduce Web design frameworks, which provide a 
bridge between current framework technology and Web environments. 



3 Components of a Web Design Framework Architecture 

3.1 Definition 

Let us consider a Web application as "a structured set of objects that may be 
navigated, and processed, in order to achieve one or more tasks”. A Web application 
framework may be defined as "a generic definition of the possible application objects, 
together with a generic definition of the application's navigational and processing 
architecture". A Web application framework must then define the set possible objects 
to be navigated, how they can be structured in their navigation architecture, and how 
they may behave. Current framework technology would allow us to stress object 
relationships and behaviors in a specific programming language and wouldn’t allow 
us to specify navigation architectures. 

We define a Web design framework as a „generic design of possible Web 
application architectures, including conceptual, navigational and interface aspects, in 
a given domain". Web design frameworks must be environment and language- 
independent. 

As previously said one of the important defining aspects of a framework are its 
hot-spots, i.e., the places in the framework where the designer may introduce the 
variations or differences for a particular application in the same domain of the 
framework. We have taken the approach of modeling many applications in the same 
domain (e.g., discussion lists, online publications, online stores...) using OOHDM, 
and comparing the resulting specifications. From this comparison, it was possible to 
determine the similarities and differences between them, which in turn allowed us to 
identify what should be the hot spots in a framework that could subsume the set of 
applications in each particular domain. 

As a result, in order to define hot-spots for Web application frameworks, we used 
OOHDM models, namely Conceptual and Navigation, as a starting point. Before 
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detailing hot-spot definitions, we briefly recapitulate some key concepts in the 
OOHDM approach that will serve as the basis for hot-spot definition. 

The first key concept is that in a Web application, the user navigates over 
(navigation) objects that are views of conceptual objects; these views are defined 
opportunistically, according to particular user profiles and tasks. The second key 
concept is that navigation objects must be organized into useful structured sets, called 
contexts. The structure of these sets defines the intra-set navigational architecture, 
whereas the set of conceptual relations, which are mapped onto navigation links, 
defines the inter-set navigational architecture. 

Since sets can be defined in different ways, this induces different types of contexts: 

1. Simple class derived - includes all objects of a class that satisfy some property 
ranging over their attributes; e.g., „books with author=Umberto Eco“, „CDs 
with performer = Rolling Stones“, etc. 

2. Simple link derived - includes all objects related to a given object; e.g., 
„reviews on „The Name of the Rose“", „CDs that were bought by persons who 
also bought „Flashpoint““, etc. 

3. Arbitrary - The set is defined by enumeration. For example, a guided tour 
showing some pictures in a virtual museum, or some outstanding books in a 
collection. 

4. Many times, contexts appear in families or groups of related contexts; the most 
common types are defined below 

5. Class derived group - is a set of simple class derived contexts, where the 
defining property of each context is parameterized; e.g. „Products by 
Manufacturer", „Books by Keyword" (Manufacturer and Keyword can vary). 
(Notice that we are considering "Manufacturer" as an attribute of Product, as 
discussed previously). 

6. Link derived group - a set of link derived contexts, each of which is obtained by 
varying the source element of the link; e.g. „Book by Order" (Order can vary). 

A context is said to be dynamic if its elements may change during navigation. This 
can happen for two reasons - because it is possible to explicitly add to or remove 
elements from a context, or because it is possible to create new objects or links, or 
alter existing ones. In the latter case, all class (respectively, link) derived contexts 
automatically become dynamic. 

A Web design framework may then be defined by an analogous set of models - a 
Conceptual Model, a Navigation Model, and rules for mappings between them. 
However, each of these models will be made up of different primitives than the ones 
used in OOHDM itself. 

It must be emphasized that web design frameworks, while following the same 
philosophy as application frameworks, use quite different mechanisms for hot-spot 
definition and instantiation. Whereas the latter use subclassing and class instantiation 
to instantiate hot-spots, web design frameworks use selective mapping and generic 
context instantiations, as will be explained next. 

For the sake of conciseness we do not include variability related with the Interface 
model in this paper. 
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3.2 Conceptual Model 

A first question that must be answered is what characterizes the application domain. 
The first step in the design process is to define a Conceptual Model, upon which 
applications will be built by defining navigational views for each particular user 
profile. This architecture (present in OOHDM) already constitutes part of a 
framework, since it is possible to build many applications starting from the same 
Conceptual Model. Therefore, we define the application domain as being the model 
characterized by the Conceptual Schema; it defines the abstract classes, possibly some 
concrete classes, and the relationships that make up the domain of application. These 
classes may include (applicative) behavior specification as well. The hot-spots of an 
object-oriented framework for the application domain can be defined according 
to [lOL 

Inpij" ^we present a generic conceptual model for electronic stores. Notice that 
genericity in the conceptual schema can be obtained by following well-known 
practices in object-oriented design [1]. In this paper we stress the novel aspect of Web 
design frameworks: the specification of generic navigation structures. 




Fig. 1. An example of a conceptual schema for a generic electronic online store 



3.3 Navigation Model 

A Navigational model is defined by a Navigation Class schema (mapped from the 
Conceptual Model), and by a Navigation Context Schema. The elements in the 
Navigation Class Schema are abstract and concrete node classes, and links, defined as 
views over the Conceptual Class Schema. Differently from OOHDM, these classes 
may contain optional elements (attributes or methods); links may also be optional. 
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Optional elements may be omitted when the framework is instantiated. Therefore, the 
optional inclusion of Navigation Class schema elements is a first hot-spot in a Web 
design framework. 

As an example, in the e-commerce domain, one may define a Conceptual C lass 
"Product" that has "Description_Image“, and "Description_Text" attributes (See fig.| 
|l). It is possible to define, in the framework's Navigation Class Schema, that there is a 
Navigation Class "Product", derived from the corresponding conceptual class, but 
where the "Descriptionjmage" is optional. This means that this framework can he 
instantiated into applications that do not include an image of a product. Similarly, the 
framework may specify that the link "Related Product", between products, is optional, 
and therefore two different actual applic ations, one including it, another omitting it, 
are valid instances of this framework. In |Fig. 2| we show a possible navigational class 
schema in this domain. 

To generalize, the first hot-spot in a Web design frameworks is defined by 
establishing the constraints on possible mappings between Conceptual an Navigation 
Classes, stating among other things that certain elements of the Navigation Class 
Schema are optional. These constraints must also address the issue of consistency 
during framework instantiation, which will be discussed later. 




Fig. 2. An example Navigation Class Schema for an electronic online store. Dashed lines 
indicate optional links 
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The second component of the Navigational model is the Navigation Context 
Diagram. For Web design frameworks, it is necessary to generalize the concept of 
Navigation Context, replacing it with the concept of Generic Navigation Context. A 
Generic Navigation Context is the specification of a set of possible Navigation 
Contexts, subject to some constraints. This is another type of hot-spot in a Web 
design framework, which is exercised during instantiation by substituting the Generic 
Context by actual Navigation Contexts, all of them satisfying the Generic Context's 
constraints. The framework's Context Navigation Diagram is supplemented with a set 
of instantiation rules, which further constrain possible Context Diagrams that may be 
derived from it during the framework instantiation process. 

Consider for example an e-commerce application, where one is designing a 
product catalog section. One may define several contexts that group instances of 
Navigation Class "Product" in different ways, for instance, "Product by Category", 
"Product by Price", etc. Each of these defines a Navigation Context. A Generic 
Navigation Context can be "Any Class-Derived Context based on Product"; it is 
generic because it does not specify the particular property that is used to define each 
derived context. In addition, it constrains its concrete instances by requiring that all of 
them be based on some property over the attributes of Navigation Class "Product". 
When a given framework's Context Diagram includes concrete Navigation Contexts 
(i.e., non-generic), it means that all applications derived from this framework must 
include such contexts as defined in the specification, without variations. 

Besides Navigation Contexts, a Context Diagram also contains the specification of 
Access Structures (indexes). By analogy, the framework's Context Diagram will also 
contain the definition of Generic Access Structures, which are generalizations of 
Access Structures. In Generic Access Structures, the criteria that may be varied are 
which elements are included, their ordering, whether the index is static or dynamic, 
and so on. Again, a Generic Access Structure may be substituted by several actual 
access structures, all of which satisfy its instantiation rules. The inclusion of concrete 
Access Structures in the framework's Context Diagram determines that all 
instantiations include that access structure; a common example is the "Main Menu" 
access structure, present in most frameworks. 

Summarizing, a framework's Navigation Context diagram is made up of Generic 
Navigation Contexts and (regular) Navigation Contexts, plus a set of instantiation 
rules. It defines all valid Context Diagrams of actual applications that can be obtained 
from the framework instantiation. 



4 The OOHDM-Frame Notation 

In order to specify Web design frameworks, we have defined a new set of models, 
called OOHDM-Frame. A framework specification in OOHDM-Frame is comprised 
of a Conceptual Model specification and a Navigation Model specification, together 
with instantiation rules. We have already discussed the Conceptual Model in the 
previous sub-section; it uses the same primitives and notation as OOHDM Conceptual 
Models plus hot-spots with the notation in [10]. Whereas generic behaviors are 
specified in the conceptual model, generic navigation architectures are specified in the 
navigational model. 
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The Navigation Model in OOHDM-Frame is made up of a Navigation Class 
Schema, a Context Diagram, and a set of mapping and instantiation rules, controlling 
how the Navigation Class Schema is mapped onto the Conceptual Class Schema. The 
Navigation Class schema is similar to the Conceptual Class schema, except for the 
fact that Class attributes may be optional (marked with an "*") and Relations (links) 
can be optional (drawn with a dashed line). In addition, a Navigation Class in the 
Navigational Class Schema may also be sub-classed during instantiation, thus 
implementing at least in part the "traditional" hot-spot mechanism in OO frameworks. 
For example, class "Product" in may be sub-classed in actual applications, which will 
allow the definition of the real products in each case. 

Let us briefly examine the notation used to represent generic contexts, and their 
respective context cards. 

The first kind is the Generic Simple Context. It may be substituted during 
instantiation by any simple context (class or link derived). The context card will detail 
the instantiation restrictions, such as whether there can be from 0 to n instances 
(cardinality); whether the resulting contexts are communicating (i.e., it is possible to 
switch from one to the other at any moment during na vigation within either one of 
them); and the allowed types with respect to persistence (jFig. 3}. 





n 


Generic 




Context 










Generic Context 

Cardinality: 0 to n 

Communicability: [0 1| 

Possible types: [static persistent 
dynamic session dynamic] [index 

access! 

Consistency /instantiation constraints: 
Type: [simple grup] 



Fig. 3. A Generic Context, and its specification card 

The cardinality specification may be used to indicate that a generic context must 
have at least one instantiation, e.g., 1 to n. Arbitrary contexts, i.e., those whose 
elements are enumerated, are also represented with the same notation. 

Fig 4 shows the representation of other generic contexts. An instance creation or 
modification context is a dynamic context that allows the creation of new object 
instances, or changing the attributes of an existing object. In this case, the only hot 
spot is choosing the cardinality - 0 means it may not be instantiated, I means its 
instantiation is mandatory. 

Similarly to contexts, access structures also have a counterpart, generic access 
structures. The only restriction placed on index instantiations is that it must be 
compatible with the context instantiations they point to. Since indexes may be 
hierarchic, it is also possible to make the actual hierarchy a hot spot. In this case, we 
use the notation shown in Fig 4 

Another type of hot spot is the property of a context of being protected or not. It 
should be recalled that contexts may have access restrictions for users of a certain 
type. This hot spot is denoted in the diagram with a double-edged oval next to the 
corresponding generic context or index 

In the following section we will examine in more detail the actual process of 
framework instantiation, by looking at an example. 
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Fig. 4. Other generic contexts and access structures and their specification cards 



5 Instantiating a Framework 




Fig. 5. The Context Diagram for an Online Store application framework 
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This „generic“ diagram is at an abstraction level that allows reasoning closer to the 
domain of online stores. The main goal of such sites is, evidently, to sell products. 
Therefore, it must allow as many paths as possible leading to products; furthermore, 
once the reader („consumer“) has reached a product, he should be lead (or shown) as 
many additional products as possible. 

Generic context „Product by Property“ is a either simple our group class derived 
context, which will be typically instantiated into one or more contexts that allow 
navigation among products according to certain properties (e.g., „Product by price“; 
„Product by size“; „Product by Color“; etc...). Once within any of these, it is 
normally possible to navigate to other „Related Products" (e.g., accessories, matching 
products, etc...). There are several access structures that lead the reader into these 
contexts; typically, these are hierarchical access structures that reflect product 
sections (departments) in a real world store. 

Additional paths leading to products can be offered by opportunistically grouping 
products according to some (arbitrary) criteria, such as „N.Y. Times Bestsellers List", 
„John’s Recommendations", or „Promotions“. Such groupings are modeled by the 
generic „Products by Reference" context, which can be reached through the generic 
hierarchical index „...: Generic Reference". 

The shopping basket and the order itself are modeled as concrete dynamic 
contexts, which are always present in any instantiated application (note the absence of 
dashed lines). In addition, the access to order is normally protected by some 
identification process (notice the double oval next to the „Order Form" context). 
Finally, some online stores may require customer identification at the entrance, which 
is modeled by having the „Main Menu" access structure is optionally protected. 

Let us now look at how this framework may be instantiated into the „book“ section 
of Amazon.com’ s website (http://www.amazon.com). We have deliberately chosen to 
exemplify with only a portion of that website, for reasons of clarity and space; for 
instance, w e have not included user profile management that appears in the 
framework. pig- 6| shows the context diagram for this section of Amazon.com’ s 
website. 

It can be readily seen that, as discussed in [13], this site does not explore the „Set 
Based Navigation" pattern (which is embodied in the „Navigation Context" 
primitive). When looking at a particular book, it is possible to navigate to several 
related books, which are accessible through indexes: „books that other customers that 
have purchased this book have also purchased" („related“ index); „books by the same 
author") („author“ index); „books with similar subjects" („subject“ index); „books 
recommended by auction and zShops participants" („auctions“ index). Regardless of 
which index is used, once the user has navigated from the index to the book node, 
context information is lost. 
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Fig. 6. The Context Diagram for http://www.amazon.com, an instantiation of diagram in ^ig^_5j 
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Fig. 7. An example of a „Book by title" instance. The related books are accessible through a 
variety of indexes, as indicated. Only the top screenful of this (long) page is shown 
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The main instantiation mappings for the Amazon.com website are shown in |^ig. 7. 
Notice that a single generic index, „similar property", was instantiated into four 
different indexes, „author“, „subject“, „related“, and „auction“. „Generic References" 
are mapped onto a simple set of indexes „Publication: Recommended", which give 
the recommended books by several popular publications; the actual text of the 
recommendations are not included, hence the absence of the „Generic Reference" 
context itself. 



{[" ,,.:Generic ij 

II Reference \ ^ 



Generic Reference 

I 



Arbitrary 



— — — — — - 




Publication: Recommended 



I 

I 




Instantiation 



Book 




Framework Instantiation 

Fig. 8. The instantiation mapping betwe en the ap plication framework context diagram in pig. 5| 
and the Amazon.com website shown in pig. 6| Only the most interesting mapping for access 
structures is shown 



For those readers familiar with online stores, it can be noticed that the approach 
taken in Amazon.com is present in many other online stores; resulting in very similar 
navigation diagrams; as an example, we cite http://www.etoys.com. 

In order to show that this application framework is quite generic, we have also 
instantiated it for Gap’s online store, http://www.gap.com. This site has a 
straightforward b ut quite effective navigation architecture, as can be seen in the 
diagram shown in [Fig. 9| 
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Fig. 9. The context diagram for the Gap Online Store, http://www.gap.com 



This site has an interesting use of „Set Based Navigati on", as can be observed 
when navigating in the „related Product" context. |Fig. 10| shows the same product, 
„pique polo shirt", in two different contexts: (a) „Related Products", since it is related 
to „relaxed fit pleated khakis", in category „Pants and Shorts"; and (b) „Product by 
category". Notice that the index of the original context („Pants and Shorts" and 
„Shirts and Polos" remain on the left side of the screen). 




(a) (b) 

Fig. 10. An example of a Product - „pique polo shirt" in two different contexts: (a) related 
product to „relaxed fit pleated pants"; (b) in the „product by category" context 



A compari son be tween the framewo rk context diagram in |Fig. 5 and the 
instantiation in pig. 9 is shown in pig. 11| 
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Fig. 11. The instantiation mapping between the application framework context diagram in pig. | 
^ and the Gap.com website shown in |Fig. 9| Only the most interesting mapping for access 
structures is shown at the top 



6 From Design to Application Frameworks and to Web 
Applications 

Web design frameworks help to specify the abstract architecture of a family of Web 
applications. A framework includes the specification of both the common aspects of 
applications in the domain and the hot-spots where the specificities of a particular 
application are accommodated. Web design frameworks are powerful because they 
are not language or environment-dependent, i.e. we can use the framework to produce 
Web applications in different settings, including non-object oriented ones. 

There are many different alternatives to produce running Web applications for a 
given framework. In this section we briefly discuss two of them: mapping the design 
framework onto an application framework (and then instantiate this framework), or 
instantiating the design framework into an OOHDM model, and then Implementing 
the resulting model in the Web. 

6.1 Web Design Frameworks Mapped to Application Frameworks 

We have designed an object-oriented architecture that allows designers to implement 
Web application frameworks for specific domains (an early version is discussed in 
[3]; an alternative version is described in [9]). This architecture (and its Java 
implementation) contains classes that support the core OOHDM primitives (nodes, 
links, indexes and contexts); these classes can be plugged Into domain specific classes 
(Products, Orders, etc) to improve their behavior with navigation functionality. Using 
this architecture, a designer should implement the generic conceptual model using an 
object-oriented programming language (e.g., Java), and for each particular application 
either sub-class or instantiate the domain classes and connect them with the OOHDM- 
specific classes that were derived from the generic navigational schema. 
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In this architecture we decouple the domain and navigational model from the 
components that provide dynamic content generation on the Weh (ranging from 
CGI/ISAPI to ASP/JSP). In this way a Web application framework can be designed to 
be independent of particular industry technologies (and can thus evolve seamlessly). 

In this architecture, the OOHDM server is focused on providing the ability to 
access nodes in different contexts, managing the navigation spaces and linking among 
nodes. The HTML rendering task is performed on the Webserver side by either a 
custom third-party CGI/ISAPI module or dynamic html pages servers like ASP or 
ISP. A similar approach has been used in [4]. 



6.2 Instantiating a Design Framework into a Web Application Using a 
Development Environment 

Another possible implementation of a particular Web application can be obtained by 
directly implementing an instantiated Web design framework using standard Web 
tools. We have been using OOHDM-Web [17] for this purpose. In OOHDM-Web, a 
complete OOHDM design is represented using special purpose data structures, which 
are nested lists of attribute-value pairs. These data structures contain class definitions 
(including InContext classes), navigation context definitions, access structure 
definitions and interface definitions. These definitions include the description of 
database entries that store instance data. 

Context definitions comprise the query definition that selects the elements that 
belong to the context; the same is true for access structure definitions. Interface 
definitions are mixed html templates, one for each class in each context where it 
appears. The mixed HTML template intersperses pure HTML formatting instructions 
with function calls to a library of pre-defined functions that are part of the OOHDM- 
Web environment. These functions allow retrieval of object attributes, or reference to 
other objects in specified contexts. Reference functions are defined in such a way 
that, when activated by the user, they cause the exhibition of the destination object in 
the appropriate context, using the template defined for that context. 

Following the same approach, we have defined OOHDM-Frame in a similar way, 
substituting generic definitions for the concrete ones whenever necessary. The 
resulting representation describes the generic design of the framework in question. 
The instantiation process will substitute the generic definitions in the framework by 
the definitions (using the OOHDM-Web representation) of their corresponding 
instantiated elements. For example, a generic class-derived context can be substituted 
by two (or more) class-derived contexts in the instantiated framework; this is 
achieved by actually replacing, in the data structure that describes de framework, the 
generic context description by the descriptions of the two concrete contexts, using the 
OOHDM-Frame format. 

At the end of this process, when all hot-spots have been plugged into the 
corresponding concrete application elements, the resulting data structure is a valid 
OOHDM-Web representation of the final instantiated application, ready to be used. 
Our current implementation does not automatically support all constraint 
verifications, which must be done manually by the designer when instantiating the 
framework. 
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7 Concluding Remarks and Further Work 

In this paper we have introduced Weh Design frameworks as a novel technology to 
further design reuse in Web applications. A Web framework contains the 
specification of both the behavior and navigational structure of a family of Web 
applications in a particular domain. We have also introduced OOHDM-Frame, a 
concise syntax that allows expressing generic OOHDM models that conform to a 
framework specification. We have shown how to instantiate a design framework into 
a particular application using the domain of online stores as an example. We finally 
showed that Web design frameworks can be mapped in a straightforward way into 
application frameworks by showing a specific architecture. 

This architecture allows a designer to implement an object-oriented framework for 
a particular application domain in much the same way he would do it if the 
applications were not supposed to run in the Web. He could then plug his application 
classes into Web specific components (implementing nodes, links and contexts) in 
order to deploy a running Web application. Web design frameworks can be also 
directly mapped into an application by using the OOHDM-Web environment 

One of the most (if not the most) important architectural components in Web 
Design Frameworks are (generic) Navigational Contexts. Contexts are recurrent 
patterns in Web applications as they usually deal with sets of similar objects (products 
in a store, paintings in a museum’s room, etc). The notion that patterns contribute to 
define the architecture of complex applications is not new [8] though it is just being 
perceived in the Web community (see for instance [6, 5]). We are now incorporating 
other navigation patterns into OOHDM-Frame to enhance its expressive power. 

We are also improving our support architectures for Web Design Frameworks; we 
strongly believe that development, delivery and maintenance times in the Web 
domain require reuse-centric approaches. The systematic reuse of semi-complete 
design structures, as described by Web design frameworks is a key approach for 
maximizing reuse in Web application development. 
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